Create column that orders ID by first Start Date

March 21, 2023

Imagine I have the following dataframe:

ID  Start Date
1   1990-01-01
1   1990-01-01
1   1991-01-01
2   1991-01-01
2   1990-01-01
3   2002-01-01
3   2000-01-01
4   1991-01-01

What would be the best way to create a column named Order that, for each unique ID in the ID column, starting with 1 with the earliest Start Date and adds 1 to the subsequential earliest Start Dates (and if same value, doens’t matter the order) resulting on the following dataframe:

ID  Start Date  Order
1   1990-01-01  2
1   1990-01-01  3
1   1989-01-01  1
2   1991-01-01  2
2   1990-01-01  1
3   2002-01-01  2
3   2000-01-01  1
4   1991-01-01  1

>Solution :

Use groupby.rank:

df['Start Date'] = pd.to_datetime(df['Start Date'])
df['Order'] = df.groupby('ID')['Start Date'].rank('first', ascending=False).astype(int)

Output:

   ID Start Date  Order
0   1 1990-01-01      2
1   1 1990-01-01      3
2   1 1991-01-01      1
3   2 1991-01-01      1
4   2 1990-01-01      2
5   3 2002-01-01      1
6   3 2000-01-01      2
7   4 1991-01-01      1

With ascending=True:

   ID Start Date  Order
0   1 1990-01-01      1
1   1 1990-01-01      2
2   1 1991-01-01      3
3   2 1991-01-01      2
4   2 1990-01-01      1
5   3 2002-01-01      2
6   3 2000-01-01      1
7   4 1991-01-01      1