How can I compute the rolling mean of a column for a set period of time, using Pandas and groupby, given a specific condition?

This is a variation of this question, only with a new column, Track:

Date Jockey ID Position Track
23-12-2018 4340 1 Sha Tin
25-11-2018 4340 5 Sha Tin
19-12-2018 4340 10 Happy Valley
01-01-2019 4340 3 Happy Valley
18-10-2017 8443 1 Sha Tin
18-02-2018 8443 6 Sha Tin
12-05-2018 8443 7 Happy Valley

What I am looking for is a way to compute the rolling mean final position, in the last 1000 days, on a specifically on Sha Tin. The final result:

Date Jockey ID Position Track Mean Position
23-12-2018 4340 1 Sha Tin 1 (1/1)
25-11-2018 4340 5 Sha Tin 3 (1+5)/2
19-12-2018 4340 10 Happy Valley
01-01-2019 4340 3 Happy Valley
18-10-2017 8443 1 Sha Tin 1 (1/1)
18-02-2018 8443 6 Sha Tin 3.5 (1+6)/2
12-05-2018 8443 7 Happy Valley

>Solution :

Add Track column to groupby:

df['new'] = (df.set_index('Date')
               .groupby(['Jockey ID','Track', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)

If need filter is possible, but output is different, for not matched are NaNs:

mask = df['Track'].eq('Sha Tin')

df.loc[mask, 'new'] = (df[mask].set_index('Date')
               .groupby(['Jockey ID','Track', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)

Leave a Reply