Pandas aggregate order for monthly/weekly changes

February 7, 2022

I am retrieving data from a SQL database using django, and then convert it to a pandas dataframe.
The goal is to calculate some metrics for each person well as metrics overall across all persons.

After calculating my metrics, I want to compare the values to the previous month and calculate the changes as percentage.

This works roughly like this:

member_frame = member_frame.truncate(before=(previous_start - timedelta(days=1)))

which gives me all entries from the last month up to including the end of this month.

I then calculate the sum of some values:

aggregate_avgs = member_frame.groupby(member_frame.index.month).mean().replace(np.inf, 1).fillna(0)

And finally calculate the percentage changes:

avgs_change = aggregate_avgs.pct_change().replace(np.inf, 1).fillna(-1)

Which, if we look at the changes from january to february, like that:

target_date
1           -1.000000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
2            0.142857  0.076923 -0.828571            0.0 -0.587774 -0.699994  -0.507199

This works… fine? Since I am only interested in the changes from the first to the second date, I simply take the values from the second row.

But if I do the same for december and january, I get this:

target_date
1           -1.00000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
12           0.02682  0.273444  0.449811            0.0  0.259059  0.424178   0.223225

As you can see, this is somewhat reversed, and the changes are actually for the wrong month.
The target_date index is a datetime index.

I suspect that the same will happen if I do that with weekly changes, when crossing years.

What am I doing wrong?
I am quite new to pandas, so it might as well be a very easy mistake.