Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas aggregate order for monthly/weekly changes

I am retrieving data from a SQL database using django, and then convert it to a pandas dataframe.
The goal is to calculate some metrics for each person well as metrics overall across all persons.

After calculating my metrics, I want to compare the values to the previous month and calculate the changes as percentage.

This works roughly like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

member_frame = member_frame.truncate(before=(previous_start - timedelta(days=1)))

which gives me all entries from the last month up to including the end of this month.

I then calculate the sum of some values:

aggregate_avgs = member_frame.groupby(member_frame.index.month).mean().replace(np.inf, 1).fillna(0)

And finally calculate the percentage changes:

avgs_change = aggregate_avgs.pct_change().replace(np.inf, 1).fillna(-1)

Which, if we look at the changes from january to february, like that:

target_date
1           -1.000000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
2            0.142857  0.076923 -0.828571            0.0 -0.587774 -0.699994  -0.507199

This works… fine? Since I am only interested in the changes from the first to the second date, I simply take the values from the second row.

But if I do the same for december and january, I get this:

target_date
1           -1.00000 -1.000000 -1.000000           -1.0 -1.000000 -1.000000  -1.000000
12           0.02682  0.273444  0.449811            0.0  0.259059  0.424178   0.223225

As you can see, this is somewhat reversed, and the changes are actually for the wrong month.
The target_date index is a datetime index.

I suspect that the same will happen if I do that with weekly changes, when crossing years.

What am I doing wrong?
I am quite new to pandas, so it might as well be a very easy mistake.

>Solution :

I think you could write aggregate_avgs = member_frame.groupby(pd.Grouper(freq="M")).mean() as is described here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading