Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I calculate sum by overlapping time intervals by grouping each name?

How can I calculate sum by overlapping time intervals by grouping each name. Basically the smaller interval should be merged with larger interval if that group name.

input

df1 = (pd.DataFrame({'name': ['a', 'a', 'a', 'b', 'b'],
              'time_start': ['2000-01-01 00:01:12',
                            '2000-01-01 00:01:14',
                            '2000-01-01 00:03:12',
                            '2000-01-01 00:05:12',
                            '2000-01-01 00:05:16'],
              'time_end': ['2000-01-01 00:01:18',
                            '2000-01-01 00:01:16',
                            '2000-01-01 00:03:24',
                            '2000-01-01 00:05:40',
                            '2000-01-01 00:05:18'],
                    'values':[20,30,40,20,5]})
 .assign(time_start = lambda x: pd.to_datetime(x['time_start']),
        time_end = lambda x: pd.to_datetime(x['time_end'])))

output should be

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

name    time_start  time_end    values
0   a   2000-01-01 00:01:12 2000-01-01 00:01:18 50
1   a   2000-01-01 00:03:12 2000-01-01 00:03:24 40
2   b   2000-01-01 00:05:12 2000-01-01 00:05:40 25

>Solution :

You can use a groupby.shift then groupby.agg:

df1[['time_start', 'time_end']] = df1[['time_start', 'time_end']].apply(pd.to_datetime)

g = (~df1['time_start'].lt(df1.groupby('name')['time_end'].shift())).cumsum()

out = (df1.groupby(['name', g], as_index=False)
          .agg({'time_start': 'min',
                'time_end': 'max',
                'values': 'sum'})
      )

Output:

  name          time_start            time_end  values
0    a 2000-01-01 00:01:12 2000-01-01 00:01:18      50
1    a 2000-01-01 00:03:12 2000-01-01 00:03:24      40
2    b 2000-01-01 00:05:12 2000-01-01 00:05:40      25
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading