Comparing one date column to another in a different row

I have a dataframe,

df_corp
ID        arrival_date         leaving_date
1           01/02/20             05/02/20
2           01/03/20             07/03/20
1           12/02/20             20/02/20
1           07/03/20             10/03/20
2           10/03/20             15/03/20

I would like to find the difference between leaving_date of a row and arrival date of the next entry with respect to ID. Basically I want to know how long before they book again.

So it’ll look something like this.

df_corp
ID        arrival_date         leaving_date       time_between
1           01/02/20             05/02/20             NaN
2           01/03/20             07/03/20             NaN
1           12/02/20             20/02/20              7
1           07/03/20             10/03/20             15
2           10/03/20             15/03/20              3

I’ve tried grouping by ID to do the sum but I’m seriously lost on how to get the value from the next row and a different column in one.

Thanks for any help.

>Solution :

You need to convert to_datetime and to perform a GroupBy.shift to get the previous departure date:

# arrival
a = pd.to_datetime(df_corp['arrival_date'], dayfirst=True)
# previous departure per ID
l = pd.to_datetime(df_corp['leaving_date'], dayfirst=True).groupby(df_corp['ID']).shift()
# difference in days
df_corp['time_between'] = (a-l).dt.days

output:

   ID arrival_date leaving_date  time_between
0   1     01/02/20     05/02/20           NaN
1   2     01/03/20     07/03/20           NaN
2   1     12/02/20     20/02/20           7.0
3   1     07/03/20     10/03/20          16.0
4   2     10/03/20     15/03/20           3.0

Leave a Reply