How to calculate percentage change within a group and across a group?

August 21, 2023

I have seen multiple posts asking this question but have been unable to get the to work with my dataset.

I have a dataframe containing summary information (output from a .groupby and describe() operation).

data = [
    [123456, "2017", 8.0, 150.235],
    [123456, "2018", 8.0, 202.5],
    [123456, "2019", 7.0, 168.526],
    [123456, "2020", 6.0, 175.559],
    [123456, "2021", 8.0, 206.667],
    [789101, "2017", 8.0, 228.9],
    [789101, "2018", 5.0, 208]
]

df = pd.DataFrame(
    data,
    columns=[
        "ID",
        "year",
        "count",
        "mean",
    ],
)
df

I am trying to do 2 things:

Calculate the percentage change within ID from year to year.
Calculate the percentage change across ID, i.e from the first year to the latest year what is the overall percentage change.

Things I’ve tried:

df["pct_ch"] = df.groupby(["ID", "year"])["mean"].apply(pd.Series.pct_change) + 1 as suggested here, unfortunately this returns a column of all NaN.
df["pct_change"] = df.groupby(["ID", "year"])["mean"].pct_change(-1) as suggested here, which also returns a column of all NaN values.

I am curious what I am doing wrong (why am I getting all NaN and how I can calculate these percentage change columns?

Ideal output would look something like this:

>Solution :

You shouldn’t group by year, and you can just groupby.pct_change directly:

df['pct_ch'] = df.groupby('ID')['mean'].pct_change()

By setting year as group, you groups contain only one row, and there is nothing to compute.

Output:

       ID  year  count     mean    pct_ch
0  123456  2017    8.0  150.235       NaN
1  123456  2018    8.0  202.500  0.347888
2  123456  2019    7.0  168.526 -0.167773
3  123456  2020    6.0  175.559  0.041732
4  123456  2021    8.0  206.667  0.177194
5  789101  2017    8.0  228.900       NaN
6  789101  2018    5.0  208.000 -0.091306

For the overall percentage change use a custom function:

df['pct_ch_overall'] = df.groupby('ID')['mean']. transform(lambda g: g.iloc[[0, -1]].pct_change())

Output:

       ID  year  count     mean    pct_ch  pct_ch_overall
0  123456  2017    8.0  150.235       NaN             NaN
1  123456  2018    8.0  202.500  0.347888             NaN
2  123456  2019    7.0  168.526 -0.167773             NaN
3  123456  2020    6.0  175.559  0.041732             NaN
4  123456  2021    8.0  206.667  0.177194        0.375625
5  789101  2017    8.0  228.900       NaN             NaN
6  789101  2018    5.0  208.000 -0.091306       -0.091306