I have dataframe:
ID DATE_1 DATE_2
12 01.01.2015 01.01.2021
12 01.01.2016 01.01.2021
12 01.01.2017 01.01.2019
12 01.01.2012 01.01.2021
13 01.01.2011 01.01.2021
13 01.01.2014 01.01.2020
13 01.01.2010 01.01.2018
13 01.01.2013 01.01.2016
14 01.01.2012 01.01.2021
14 01.01.2016 01.01.2020
14 01.01.2017 01.01.2021
14 01.01.2018 01.01.2021
I want to group by ID and leave records with the newest DATE_2 and the oldest DATE_1. How to do it?
>Solution :
IIUC, this is a classical groupby+agg. You need to set the dates to a datetime type for meaningful comparisons:
(df
.assign(DATE_1=pd.to_datetime(df['DATE_1']),
DATE_2=pd.to_datetime(df['DATE_2'])
)
.groupby('ID')
.agg({'DATE_1': 'min', 'DATE_2': 'max'})
)
output:
DATE_1 DATE_2
ID
12 2012-01-01 2021-01-01
13 2010-01-01 2021-01-01
14 2012-01-01 2021-01-01