Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate time difference from row before within groups

OK, I’m super confused here.

I had a look at this post (Time difference within group by objects in Python Pandas) and others and tried to follow them. Still I’m getting wonky answers.

What I would like to do is calculate the time in minutes since the previous row within each person’s ID, and the first reading for each person should also = 30 minutes.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s what I tried:

import pandas as pd

ids = [1,1,1,2,2,2,3,3,3,4]
dates = ["12/25/2021 7:47:01", "6/23/2021 20:02:32", "2/24/2021 18:57:40", "2/26/2021 20:17", "3/7/2021 11:23:01", "3/11/2021 17:48:55", "3/12/2021 19:22:32", "3/25/2021 11:11:11", "4/4/2021 14:53:40",  "4/6/2021 13:07:07"]

df = pd.DataFrame({'ids':ids,
                   'dates':dates})

df['dates'] = pd.to_datetime(df['dates'])

df = df.sort_values(by = ['ids', 'dates'])

print(df)

   ids               dates
2    1 2021-02-24 18:57:40
1    1 2021-06-23 20:02:32
0    1 2021-12-25 07:47:01
3    2 2021-02-26 20:17:00
4    2 2021-03-07 11:23:01
5    2 2021-03-11 17:48:55
6    3 2021-03-12 19:22:32
7    3 2021-03-25 11:11:11
8    3 2021-04-04 14:53:40
9    4 2021-04-06 13:07:07

df = df.assign(
    timediff=df.groupby(['ids']).dates.diff().dt.seconds.div(60).fillna(30))

print(df)

   ids               dates    timediff
2    1 2021-02-24 18:57:40   30.000000
1    1 2021-06-23 20:02:32   64.866667
0    1 2021-12-25 07:47:01  704.483333
3    2 2021-02-26 20:17:00   30.000000
4    2 2021-03-07 11:23:01  906.016667
5    2 2021-03-11 17:48:55  385.900000
6    3 2021-03-12 19:22:32   30.000000
7    3 2021-03-25 11:11:11  948.650000
8    3 2021-04-04 14:53:40  222.483333
9    4 2021-04-06 13:07:07   30.000000

As you can see, I’m getting weird results. Here is something like what I would expect (just ballparked numbers):

   ids               dates    timediff
2    1 2021-02-24 18:57:40   30.000000
1    1 2021-06-23 20:02:32   171360.000000
0    1 2021-12-25 07:47:01  262800.000000
3    2 2021-02-26 20:17:00   30.000000
4    2 2021-03-07 11:23:01  43800.0000
5    2 2021-03-11 17:48:55  34320.00000
6    3 2021-03-12 19:22:32   30.000000
7    3 2021-03-25 11:11:11  18720.00000
8    3 2021-04-04 14:53:40  14400.00000
9    4 2021-04-06 13:07:07   30.000000

What’s going on here and how could I fix it? Thanks in advance.

>Solution :

You need to use total_seconds() instead of seconds. The latter only includes the time within a day, but your time differences are larger than one day.

Ref: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.total_seconds.html vs https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.seconds.html

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading