I have a data frame as follows:
data1 month day
20 1 1
10 1 1
15 1 2
12 1 2
16 1 3
10 1 3
20 2 1
10 2 1
15 2 2
10 2 2
12 2 3
10 2 3
I want to find the sum of data for each day of each month and display the result as a dataframe similar to the following:
date sum_data1
1.1. 30
2.1. 27
3.1. 26
1.2. 30
2.2. 25
3.2. 22
The data set is quite big > 200,000 rows.
>Solution :
Because no column year first add it to month and day, pass to to_datetime and aggregate sum:
date = pd.to_datetime(df[['month','day']].assign(year=2022))
df = df.groupby(date.rename('date'))['data1'].sum().reset_index(name='sum_data1')
print (df)
date sum_data1
0 2022-01-01 30
1 2022-01-02 27
2 2022-01-03 26
3 2022-02-01 30
4 2022-02-02 25
5 2022-02-03 22