Home How can I determine the average of a column up to this point, using groupby?

Questions

How can I determine the average of a column up to this point, using groupby?

November 13, 2021

I have the following data frame:

HorseId	FGrating	Average FGrating
1736	110	-1
1736	124	-1
1736	118	-1
13973	144	-1
13973	137	-1

I want to fill the Average FGrating column with the average FGrating of every horse up to the point it was computed, grouped by HorseId. The result that I am looking for is this:

HorseId	FGrating	Average FGrating
1736	110	110
1736	124	117 (110+124)/2
1736	118	117.3 (110+124+118)/3
13973	144	144
13973	137	140.5 (144+137)/2

The code I used to solve this problem is:

featured_data['Average FGrating'] = featured_data[['HorseId', 'FGrating']].groupby('HorseId')[
    'FGrating'].mean()

However, it computes the desired average in only a part of the data frame not in all of it.

What am I doing wrong? How can I solve this problem?

>Solution :

We can start by grouping by HorseId and then get the cumsum of FGrating. To get the average, we just need to divide the cumsum with a cumcount like so :

>>> df_grouped =  df.groupby('HorseId')['FGrating']
>>> df['cum_sum'] = df_grouped.apply(lambda p: p.shift(fill_value=0).cumsum())
>>> df['cum_mean'] = df['cum_sum'] / df_grouped.cumcount()
>>> df['cum_mean'].fillna(df['FGrating'], inplace=True)
>>> df
    HorseId     FGrating    Average FGrating    cum_sum     cum_mean
0   1736        110         -1                  0           110.0
1   1736        124         -1                  110         110.0
2   1736        118         -1                  234         117.0
3   13973       144         -1                  0           144.0
4   13973       137         -1                  144         144.0

Or we can also do it this way (shorter) :

df['cum_mean'] = (
    df.groupby('HorseId')['FGrating'].apply(lambda x: x.shift().expanding().mean()))

dataframe

byMR

Published November 13, 2021

Add a comment

Figuring out why capturing by reference in a nested lambda produces a weird result

byMR

November 13, 2021

Questions

how to select rows from one data frame based in other data frame R (dplyr)

byMR

November 13, 2021

Questions

AWS S3 how to allow acces only from specific domain?

byMR

November 13, 2021

Questions

SQL Query, number of city and continent

byMR

November 13, 2021

Questions

why could not import the Result class from dart async

byMR

November 13, 2021

Questions

How I can validate the form with another method like asynchronous methods(promises or async-await)?

byMR

November 13, 2021

How can I determine the average of a column up to this point, using groupby?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Figuring out why capturing by reference in a nested lambda produces a weird result

how to select rows from one data frame based in other data frame R (dplyr)

AWS S3 how to allow acces only from specific domain?

SQL Query, number of city and continent

why could not import the Result class from dart async

How I can validate the form with another method like asynchronous methods(promises or async-await)?

Keep Up to Date with the Most Important News

How can I determine the average of a column up to this point, using groupby?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Figuring out why capturing by reference in a nested lambda produces a weird result

how to select rows from one data frame based in other data frame R (dplyr)

AWS S3 how to allow acces only from specific domain?

SQL Query, number of city and continent

why could not import the Result class from dart async

How I can validate the form with another method like asynchronous methods(promises or async-await)?

Discover more from Dev solutions