Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group by ID and select rows with the newest DATE1 and the oldest DATE2

I have dataframe:

ID  DATE_1      DATE_2
12  01.01.2015  01.01.2021
12  01.01.2016  01.01.2021
12  01.01.2017  01.01.2019
12  01.01.2012  01.01.2021
13  01.01.2011  01.01.2021
13  01.01.2014  01.01.2020
13  01.01.2010  01.01.2018
13  01.01.2013  01.01.2016
14  01.01.2012  01.01.2021
14  01.01.2016  01.01.2020
14  01.01.2017  01.01.2021
14  01.01.2018  01.01.2021

I want to group by ID and leave records with the newest DATE_2 and the oldest DATE_1. How to do it?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, this is a classical groupby+agg. You need to set the dates to a datetime type for meaningful comparisons:

(df
 .assign(DATE_1=pd.to_datetime(df['DATE_1']),
         DATE_2=pd.to_datetime(df['DATE_2'])
        )
 .groupby('ID')
 .agg({'DATE_1': 'min', 'DATE_2': 'max'})
)

output:

       DATE_1     DATE_2
ID                      
12 2012-01-01 2021-01-01
13 2010-01-01 2021-01-01
14 2012-01-01 2021-01-01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading