Pandas expanding average per group without changing the order of the rows

April 20, 2023

I have the following dataframe "df1":

       a   b
week        
9     47  19
9     36  44
10    29  46
10   -68 -37
10   -12  90
9     67  66

I want to calculate the expanding average of each column, conditional on the index. This is the expected output:

         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0

For example, the last value of column "b" (43) is the average of all values of column "b" of df1 where index = 9 (66, 44, 19).

I tried the following but obviously didn’t work:

df1[df1.index == df1.index].expanding().mean()

Thanks

>Solution :

If you want to simply find the expanding mean per group, the suggested duplicate is sufficient.

However if you want to retain the existing order of your original, some additional processing is needed:

(df.reset_index()
   .groupby('week')
   .expanding()
   .mean()
   .sort_index(level=1)
   .reset_index(level=1, drop=True))

         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0

By first resetting the index and then grouping by week, we get an intermediate groupby output that preserves the original monotonically increasing indices, by which we can then sort by index and throw it away.

Explanation

First, reset the index

df.reset_index()

   week   a   b
0     9  47  19
1     9  36  44
2    10  29  46
3    10 -68 -37
4    10 -12  90
5     9  67  66

Compute expanding mean per group by week

_.groupby('week').expanding().mean()
           a     b
week              
9    0  47.0  19.0
     1  41.5  31.5
     5  50.0  43.0
10   2  29.0  46.0
     3 -19.5   4.5
     4 -17.0  33.0

Notice the multiIndex in the output – we will use the first level to get our original sort order back:

_.sort_index(level=1)
           a     b
week              
9    0  47.0  19.0
     1  41.5  31.5
10   2  29.0  46.0
     3 -19.5   4.5
     4 -17.0  33.0
9    5  50.0  43.0

Success! Now, throw it away

_.reset_index(level=1, drop=True)
         a     b
week            
9     47.0  19.0
9     41.5  31.5
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0
9     50.0  43.0

A slightly shorter version of this avoids the additional postprocessing but the original ordering is lost:

df.groupby(level=0).expanding().mean().reset_index(level=1, drop=True)
         a     b
week            
9     47.0  19.0
9     41.5  31.5
9     50.0  43.0
10    29.0  46.0
10   -19.5   4.5
10   -17.0  33.0

Back in the day, you could use as_index=False and/or group_keys=False to avoid the final reset_index call but that doesn’t seem to be the case anymore.