I am trying to group a dataset based on the name and find the monthly average. i.e sum all the values for each name divided by the number of the distinct month for each name.
For example,
name time values
A 2011-01-17 10
B 2011-02-17 20
A 2011-01-11 10
A 2011-03-17 30
B 2011-02-17 10
The expected result is
name monthly_avg
A 25
B 30
I have tried
data.groupby(['name'])['values'].mean().reset_index(name='Monthly Average')
but it gives the output below instead of my desired output above:
name Monthly Average
A 16.666667
B 15.000000
>Solution :
Convert values to datetimes first, then aggregate sum per name and months by Grouper and last get mean per first level name:
data['time'] = pd.to_datetime(data['time'])
df = (data.groupby(['name', pd.Grouper(freq='m', key='time')])['values'].sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30
With months period solution is if change Grouper to Series.dt.to_period:
data['time'] = pd.to_datetime(data['time'])
df = (data.groupby(['name', data['time'].dt.to_period('m')])['values']
.sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30