Leave the first TWO dates for each id

I have a dataframe of id number and dates:

import pandas as pd
df = pd.DataFrame([['1','01/01/2000'], ['1','01/07/2002'],['1', '04/05/2003'],
                  ['2','01/05/2010'], ['2','08/08/2009'],
                  ['3','12/11/2008']], columns=['id','start_date'])
df

    id  start_date
0   1   01/01/2000
1   1   01/07/2002
2   1   04/05/2003
3   2   01/05/2010
4   2   08/08/2009
5   3   12/11/2008

I am looking for a way to leave for each id the first TWO dates (i.e. the two earliest dates).

for the example above the output would be:

    id  start_date
0   1   01/01/2000
1   1   01/07/2002
2   2   08/08/2009
3   2   01/05/2010
4   3   12/11/2008

Thanks!

>Solution :

full code:

df = pd.DataFrame([['1','01/01/2000'], ['1','01/07/2002'],['1', '04/05/2003'],
                  ['2','01/05/2010'], ['2','08/08/2009'],
                  ['3','12/11/2008']], columns=['id','start_date'])

# 1. convert 'start_time' column to datetime

df['start_date'] = pd.to_datetime(df['start_date'])

# 2. sort the dataframe ascending by 'start_time'

df.sort_values(by='start_date', ascending=True, inplace=True)

# 3. select only the first two occurances of each id

df.groupby('id').head(2)

output:

    id  start_date
0   1   2000-01-01
1   1   2002-01-07
5   3   2008-12-11
4   2   2009-08-08
3   2   2010-01-05

Leave a Reply