Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Create a new column with the difference between every fifth cell in another column

I have a set of time series for different groups of financial institutions that runs from 2019-03 to 2021-12. The columns in the dataframes are quarter and capital buffer rate, as seen below.

Quarter Capital buffer rate (%)
2019-03 38
2019-06 43
2019-09 38
2019-12 37
2020-03 37
2020-06 37
2020-09 37
2020-12 35
2021-03 37
2021-06 41
2021-09 43
2021-12 44

I have five sets of these (for investment firms, banks etc). I would like to add a new columns to each dataframe, which has the percentage point change from year to year (e.g. 4 percentage points from 2020-06 to 2021-06). This means taking the difference between every fifth cell. For the quarters that don’t have preceding yearly values (2019-03, 2019-06, 2019-09 and 2019-12), I would like to add a null value.

I made it work using a fairly clunky for-loop, subtracting the value four cells above, adding the values to a list, and adding the list to the dataframe, like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


for i in range(0, len(df)):
    if i <= 3:
        diff = np.nan
    else:  
        diff = df['Capital buffer rate'][i] - df['Capital buffer rate'][i-4]
    
    list.append(diff)

df['Percentage point change'] = list

But this doesn’t feel especially robust. And since I’m doing this for five different dataframes, I guess I have to create five variables in each loop and five lists? Is there maybe a simpler and better way of doing this?

>Solution :

I think you want this:

# set up problem
idx = pd.DatetimeIndex(pd.date_range('2019-03', '2022-01', freq='Q'))
df = pd.DataFrame({'Capital buffer rate (%)': [38, 43, 38, 37, 37, 37, 37, 35, 37, 41, 43, 44]}, index=idx)

# solution:
df['Percentage point change'] = df['Capital buffer rate (%)'] - df['Capital buffer rate (%)'].shift(4)

Output:

 Capital buffer rate (%) Percentage point change
2019-03-31  38  NaN
2019-06-30  43  NaN
2019-09-30  38  NaN
2019-12-31  37  NaN
2020-03-31  37  -1.0
2020-06-30  37  -6.0
2020-09-30  37  -1.0
2020-12-31  35  -2.0
2021-03-31  37  0.0
2021-06-30  41  4.0
2021-09-30  43  6.0
2021-12-31  44  9.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading