Home Pandas: Create a new column with the difference between every fifth cell in another column

Questions

Pandas: Create a new column with the difference between every fifth cell in another column

July 14, 2022

I have a set of time series for different groups of financial institutions that runs from 2019-03 to 2021-12. The columns in the dataframes are quarter and capital buffer rate, as seen below.

Quarter	Capital buffer rate (%)
2019-03	38
2019-06	43
2019-09	38
2019-12	37
2020-03	37
2020-06	37
2020-09	37
2020-12	35
2021-03	37
2021-06	41
2021-09	43
2021-12	44

I have five sets of these (for investment firms, banks etc). I would like to add a new columns to each dataframe, which has the percentage point change from year to year (e.g. 4 percentage points from 2020-06 to 2021-06). This means taking the difference between every fifth cell. For the quarters that don’t have preceding yearly values (2019-03, 2019-06, 2019-09 and 2019-12), I would like to add a null value.

I made it work using a fairly clunky for-loop, subtracting the value four cells above, adding the values to a list, and adding the list to the dataframe, like this:


for i in range(0, len(df)):
    if i <= 3:
        diff = np.nan
    else:  
        diff = df['Capital buffer rate'][i] - df['Capital buffer rate'][i-4]
    
    list.append(diff)

df['Percentage point change'] = list

But this doesn’t feel especially robust. And since I’m doing this for five different dataframes, I guess I have to create five variables in each loop and five lists? Is there maybe a simpler and better way of doing this?

>Solution :

I think you want this:

# set up problem
idx = pd.DatetimeIndex(pd.date_range('2019-03', '2022-01', freq='Q'))
df = pd.DataFrame({'Capital buffer rate (%)': [38, 43, 38, 37, 37, 37, 37, 35, 37, 41, 43, 44]}, index=idx)

# solution:
df['Percentage point change'] = df['Capital buffer rate (%)'] - df['Capital buffer rate (%)'].shift(4)

Output:

 Capital buffer rate (%) Percentage point change
2019-03-31  38  NaN
2019-06-30  43  NaN
2019-09-30  38  NaN
2019-12-31  37  NaN
2020-03-31  37  -1.0
2020-06-30  37  -6.0
2020-09-30  37  -1.0
2020-12-31  35  -2.0
2021-03-31  37  0.0
2021-06-30  41  4.0
2021-09-30  43  6.0
2021-12-31  44  9.0