Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Dataframe – only keep oldest records from each month

I have a Pandas Dataframe with a date column. I want to only have the oldest records for each month and remove any records that came before. There will be duplicates and I want to keep them. I also need a new column with only the month and year.

Input

Provider date
Apple 01/01/2022
Apple 05/01/2022
Apple 20/01/2022
Apple 20/01/2022
Apple 05/02/2022
Apple 10/02/2022

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Provider date month_year
Apple 20/01/2022 01/2022
Apple 20/01/2022 01/2022
Apple 10/02/2022 02/2022

>Solution :

Create column month_year with Series.dt.strftime and then compare maximal datetimes per groups by original date column in GroupBy.transform and filter in boolean indexing:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df = df.assign(month_year = df['date'].dt.strftime('%m/%Y'))

df = df[df.groupby(['Provider', 'month_year'])['date'].transform('max').eq(df['date'])]
print (df)
  Provider       date month_year
2    Apple 2022-01-20    01/2022
3    Apple 2022-01-20    01/2022
5    Apple 2022-02-10    02/2022
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading