Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to rename duplicate Columns in Pandas groupby result

I have a dataframe as such:

  DF = pd.DataFrame(
   {
       'Application Date': ['2023-06-03', '2023-05-15', '2022-02-10', '2021-12-30', '2022-09-01'],
       'Application ID': ['1', '2', '3', '4', '5']
   }
)
DF['Application Date'] = pd.to_datetime(DF['Application Date'], format='%Y-%m-%d')
DF.dtypes

I wish to find the number of applications by year and month for which I used the following piece of code

      DF.groupby(
         [DF['Application Date'].dt.year, DF['Application Date'].dt.month]
     ).size()

These result has two columns "Application Date"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

How can I rename the first column as "Application Year", second as "Application Month" and last as "No of Applicants"

>Solution :

You can use rename for the grouper columns.

Then if you want to keep them as index and have one column named 'No of Applicants', use agg with a keyword argument using a dict with the new column name as key and a tuple of ‘(column, aggfunc)’ as value:

DF = DF.groupby(
    [
        DF["Application Date"].dt.year.rename("Application Year"),
        DF["Application Date"].dt.month.rename("Application Month"),
    ]
).agg(**{"No of Applicants": ("Application ID", "size")})
                                    No of Applicants
Application Year Application Month                  
2021             12                                1
2022             2                                 1
                 9                                 1
2023             5                                 1
                 6                                 1

Or you can reset_index specifying a name and have all three as columns:

DF = (
    DF.groupby(
        [
            DF["Application Date"].dt.year.rename("Application Year"),
            DF["Application Date"].dt.month.rename("Application Month"),
        ]
    )
    .size()
    .reset_index(name="No of Applicants")
)
   Application Year  Application Month  No of Applicants
0              2021                 12                 1
1              2022                  2                 1
2              2022                  9                 1
3              2023                  5                 1
4              2023                  6                 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading