Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate monthly percentage change of daily basis data in Python

Let’s say I have a daily data as follows:

import pandas as pd
import numpy as np

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.randint(0, 100, size=(90, 3)), index=dates, columns=list('ABC'))
df

Out:

             A   B   C
2013-02-26  85  57   0
2013-02-27  94  86  44
2013-02-28  62  91  29
2013-03-01  21  93  24
2013-03-02  12  70  70
        ..  ..  ..
2013-05-22  57  13  81
2013-05-23  43  68  85
2013-05-24  55  50  53
2013-05-25  75  78  66
2013-05-26  70  93   3

For column A and B, I need to calculate their monthly pct change on daily basis, for example, the monthly pct change value of A for 2013-05-26 will be calculated by: A’s value in 2013-05-26 divided by the value in 2013-04-26 minus 1.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My idea is like this: create new columns 'A1', 'B1' by shifting them one month forward, then df['A_MoM'] will be calculated by df['A']/df['A_shifted'] - 1, same logic for column B.

Since not all the months share same length of days, so I will use last day’s value of last months, ie., to calculate 2013-03-30’s pct change will be calculated by: 2013-03-30's value/2013-02-28's value - 1.

I tried the code below, but it generates a dataframe with all NaNs:

df[['A1', 'B1']] = df[['A', 'B']].shift(freq=pd.DateOffset(months=1)).resample('D').last().fillna(method=ffill)
df[['A_MoM', 'B_MoM']] = df[['A', 'B']].div(df[['A1', 'B1']], axis=0) - 1

Out:

             A  A1   B  B1
2013-02-26 NaN NaN NaN NaN
2013-02-27 NaN NaN NaN NaN
2013-02-28 NaN NaN NaN NaN
2013-03-01 NaN NaN NaN NaN
2013-03-02 NaN NaN NaN NaN
        ..  ..  ..  ..
2013-05-22 NaN NaN NaN NaN
2013-05-23 NaN NaN NaN NaN
2013-05-24 NaN NaN NaN NaN
2013-05-25 NaN NaN NaN NaN
2013-05-26 NaN NaN NaN NaN

How could achieve that correctly? Sincere thanks at advance.

Edit:

df = pd.DataFrame(np.random.randint(0, 100, size=(90, 3)), index=dates, columns=['A_values', 'B_values', 'C'])
df.columns

df1 = df.filter(regex='_values$').shift(freq=pd.DateOffset(months=1)).resample('D').last().ffill().add_suffix('_shifted')
df2 = df.filter(regex='_values$').div(df1.to_numpy(), axis=0) - 1
df.join(df2.add_suffix('_MoM'))

Out:

ValueError: Unable to coerce to DataFrame, shape must be (90, 2): given (93, 2)

>Solution :

Reason is different columns names, solution is converting df[['A1', 'B1']] to numpy array:

df[['A1', 'B1']] = df[['A', 'B']].shift(freq=pd.DateOffset(months=1)).resample('D').last().ffill()
df[['A_MoM', 'B_MoM']] = df[['A', 'B']].div(df[['A1', 'B1']].to_numpy(), axis=0) - 1
print (df)
             A   B   C    A1    B1      A_MoM     B_MoM
2013-02-26  85  57   0   NaN   NaN        NaN       NaN
2013-02-27  94  86  44   NaN   NaN        NaN       NaN
2013-02-28  62  91  29   NaN   NaN        NaN       NaN
2013-03-01  21  93  24   NaN   NaN        NaN       NaN
2013-03-02  12  70  70   NaN   NaN        NaN       NaN
        ..  ..  ..   ...   ...        ...       ...
2013-05-22  57  13  81  14.0  50.0   3.071429 -0.740000
2013-05-23  43  68  85   2.0  45.0  20.500000  0.511111
2013-05-24  55  50  53  89.0  52.0  -0.382022 -0.038462
2013-05-25  75  78  66  86.0  54.0  -0.127907  0.444444
2013-05-26  70  93   3   4.0  45.0  16.500000  1.066667

[90 rows x 7 columns]

Or if possible assign output to df1, so columns names are not changed, so possible divide with same columns names, here A, B correctly:

df1 = df[['A', 'B']].shift(freq=pd.DateOffset(months=1)).resample('D').last().ffill()
df[['A_MoM', 'B_MoM']] = df[['A', 'B']].div(df1, axis=0) - 1
print (df)
             A   B   C      A_MoM     B_MoM
2013-02-26  85  57   0        NaN       NaN
2013-02-27  94  86  44        NaN       NaN
2013-02-28  62  91  29        NaN       NaN
2013-03-01  21  93  24        NaN       NaN
2013-03-02  12  70  70        NaN       NaN
        ..  ..  ..        ...       ...
2013-05-22  57  13  81   3.071429 -0.740000
2013-05-23  43  68  85  20.500000  0.511111
2013-05-24  55  50  53  -0.382022 -0.038462
2013-05-25  75  78  66  -0.127907  0.444444
2013-05-26  70  93   3  16.500000  1.066667

[90 rows x 5 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading