I have a dataframe as such:
DF = pd.DataFrame(
{
'Application Date': ['2023-06-03', '2023-05-15', '2022-02-10', '2021-12-30', '2022-09-01'],
'Application ID': ['1', '2', '3', '4', '5']
}
)
DF['Application Date'] = pd.to_datetime(DF['Application Date'], format='%Y-%m-%d')
DF.dtypes
I wish to find the number of applications by year and month for which I used the following piece of code
DF.groupby(
[DF['Application Date'].dt.year, DF['Application Date'].dt.month]
).size()
These result has two columns "Application Date"
How can I rename the first column as "Application Year", second as "Application Month" and last as "No of Applicants"
>Solution :
You can use rename for the grouper columns.
Then if you want to keep them as index and have one column named 'No of Applicants', use agg with a keyword argument using a dict with the new column name as key and a tuple of ‘(column, aggfunc)’ as value:
DF = DF.groupby(
[
DF["Application Date"].dt.year.rename("Application Year"),
DF["Application Date"].dt.month.rename("Application Month"),
]
).agg(**{"No of Applicants": ("Application ID", "size")})
No of Applicants
Application Year Application Month
2021 12 1
2022 2 1
9 1
2023 5 1
6 1
Or you can reset_index specifying a name and have all three as columns:
DF = (
DF.groupby(
[
DF["Application Date"].dt.year.rename("Application Year"),
DF["Application Date"].dt.month.rename("Application Month"),
]
)
.size()
.reset_index(name="No of Applicants")
)
Application Year Application Month No of Applicants
0 2021 12 1
1 2022 2 1
2 2022 9 1
3 2023 5 1
4 2023 6 1
