Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I determine the average of a column up to this point, using groupby?

I have the following data frame:

HorseId FGrating Average FGrating
1736 110 -1
1736 124 -1
1736 118 -1
13973 144 -1
13973 137 -1

I want to fill the Average FGrating column with the average FGrating of every horse up to the point it was computed, grouped by HorseId. The result that I am looking for is this:

HorseId FGrating Average FGrating
1736 110 110
1736 124 117 (110+124)/2
1736 118 117.3 (110+124+118)/3
13973 144 144
13973 137 140.5 (144+137)/2

The code I used to solve this problem is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

featured_data['Average FGrating'] = featured_data[['HorseId', 'FGrating']].groupby('HorseId')[
    'FGrating'].mean()

However, it computes the desired average in only a part of the data frame not in all of it.

What am I doing wrong? How can I solve this problem?

>Solution :

We can start by grouping by HorseId and then get the cumsum of FGrating. To get the average, we just need to divide the cumsum with a cumcount like so :

>>> df_grouped =  df.groupby('HorseId')['FGrating']
>>> df['cum_sum'] = df_grouped.apply(lambda p: p.shift(fill_value=0).cumsum())
>>> df['cum_mean'] = df['cum_sum'] / df_grouped.cumcount()
>>> df['cum_mean'].fillna(df['FGrating'], inplace=True)
>>> df
    HorseId     FGrating    Average FGrating    cum_sum     cum_mean
0   1736        110         -1                  0           110.0
1   1736        124         -1                  110         110.0
2   1736        118         -1                  234         117.0
3   13973       144         -1                  0           144.0
4   13973       137         -1                  144         144.0

Or we can also do it this way (shorter) :

df['cum_mean'] = (
    df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding().mean()))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading