How can I compute the rolling mean of a column for a set period of time, using Pandas and groupby, given a specific condition?

This is a variation of this question, only with a new column, Track:

Date	Jockey ID	Position	Track
23-12-2018	4340	1	Sha Tin
25-11-2018	4340	5	Sha Tin
19-12-2018	4340	10	Happy Valley
01-01-2019	4340	3	Happy Valley
18-10-2017	8443	1	Sha Tin
18-02-2018	8443	6	Sha Tin
12-05-2018	8443	7	Happy Valley

What I am looking for is a way to compute the rolling mean final position, in the last 1000 days, on a specifically on Sha Tin. The final result:

Date	Jockey ID	Position	Track	Mean Position
23-12-2018	4340	1	Sha Tin	1 (1/1)
25-11-2018	4340	5	Sha Tin	3 (1+5)/2
19-12-2018	4340	10	Happy Valley
01-01-2019	4340	3	Happy Valley
18-10-2017	8443	1	Sha Tin	1 (1/1)
18-02-2018	8443	6	Sha Tin	3.5 (1+6)/2
12-05-2018	8443	7	Happy Valley

>Solution :

Add Track column to groupby:

df['new'] = (df.set_index('Date')
               .groupby(['Jockey ID','Track', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)

If need filter is possible, but output is different, for not matched are NaNs:

mask = df['Track'].eq('Sha Tin')

df.loc[mask, 'new'] = (df[mask].set_index('Date')
               .groupby(['Jockey ID','Track', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)

Related

Leave a ReplyCancel reply