Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas expanding average per group without changing the order of the rows

I have the following dataframe "df1":

       a   b
week        
9     47  19
9     36  44
10    29  46
10   -68 -37
10   -12  90
9     67  66

I want to calculate the expanding average of each column, conditional on the index. This is the expected output:

         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0

For example, the last value of column "b" (43) is the average of all values of column "b" of df1 where index = 9 (66, 44, 19).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried the following but obviously didn’t work:

df1[df1.index == df1.index].expanding().mean()

Thanks

>Solution :

If you want to simply find the expanding mean per group, the suggested duplicate is sufficient.

However if you want to retain the existing order of your original, some additional processing is needed:

(df.reset_index()
   .groupby('week')
   .expanding()
   .mean()
   .sort_index(level=1)
   .reset_index(level=1, drop=True))

         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0 

By first resetting the index and then grouping by week, we get an intermediate groupby output that preserves the original monotonically increasing indices, by which we can then sort by index and throw it away.


Explanation

First, reset the index

df.reset_index()

   week   a   b
0     9  47  19
1     9  36  44
2    10  29  46
3    10 -68 -37
4    10 -12  90
5     9  67  66

Compute expanding mean per group by week

_.groupby('week').expanding().mean()
           a     b
week              
9    0  47.0  19.0
     1  41.5  31.5
     5  50.0  43.0
10   2  29.0  46.0
     3 -19.5   4.5
     4 -17.0  33.0

Notice the multiIndex in the output – we will use the first level to get our original sort order back:

_.sort_index(level=1)
           a     b
week              
9    0  47.0  19.0
     1  41.5  31.5
10   2  29.0  46.0
     3 -19.5   4.5
     4 -17.0  33.0
9    5  50.0  43.0

Success! Now, throw it away

_.reset_index(level=1, drop=True)
         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0

A slightly shorter version of this avoids the additional postprocessing but the original ordering is lost:

df.groupby(level=0).expanding().mean().reset_index(level=1, drop=True)
         a     b
week            
9     47.0  19.0
9     41.5  31.5
9     50.0  43.0
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0

Back in the day, you could use as_index=False and/or group_keys=False to avoid the final reset_index call but that doesn’t seem to be the case anymore.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading