Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I measure if there is overlap in begin time to end time within each group using Python?

How can I see if there is overlap between start time and/or end time for each group (by ID). That is to say if two "services" occurred together for any length of time from one employee (ID). I have a table like the following, but would like to calculate the overlap column.

| ID |   Begin Time   |    End Time    | Overlap |
| 1  | 1/1/2023 13:30 | 1/1/2023 13:55 |  False  |
| 1  | 1/7/2023 12:30 | 1/1/2023 13:45 |  False  |
| 2  | 1/3/2023 15:30 | 1/3/2023 16:30 |   True  |
| 1  | 1/5/2023 07:30 | 1/5/2023 08:30 |   True  |
| 2  | 1/3/2023 14:55 | 1/3/2023 15:55 |   True  |
| 1  | 1/5/2023 06:30 | 1/5/2023 09:30 |   True  |
| 1  | 1/7/2023 06:30 | 1/7/2023 09:30 |   True  |
| 1  | 1/7/2023 06:00 | 1/7/2023 06:45 |   True  |

Here is a chunk of code that creates this dataframe –>

id_list = [1,1,2,1,2,1,1,1]
begin_time = ['1/1/2023 13:30', '1/7/2023 12:30', '1/3/2023 15:30', '1/5/2023 07:30', '1/3/2023 14:55', 
             '1/5/2023 06:30', '1/7/2023 06:30', '1/7/2023 06:00']
end_time = ['1/1/2023 13:55', '1/1/2023 13:45', '1/3/2023 16:30', '1/5/2023 08:30', '1/3/2023 15:55',
           '1/5/2023 09:30', '1/7/2023 09:30', '1/7/2023 06:45']
df = pd.DataFrame(list(zip(id_list, begin_time, end_time)), columns = ['ID', 'Begin_Time', 'End_Time'])
df['Begin_Time'] = pd.to_datetime(df['Begin_Time'])
df['End_Time'] = pd.to_datetime(df['End_Time'])
df

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use Interval.overlaps in custom function with enumerate for filter out itself Interval:

def f(x):
    i = pd.IntervalIndex.from_arrays(x['Begin_Time'],
                                     x['End_Time'], 
                                 closed="both")
    a = np.arange(len(x))
    x['overlap'] = [i[a != j].overlaps(y).any() for j, y in enumerate(i) ]
    return x

df = df.groupby('ID').apply(f)
print (df)
   ID          Begin_Time            End_Time  overlap
0   1 2023-01-01 13:30:00 2023-01-01 13:55:00    False
1   1 2023-01-08 12:30:00 2023-01-08 13:45:00    False <- data was changed
2   2 2023-01-03 15:30:00 2023-01-03 16:30:00     True
3   1 2023-01-05 07:30:00 2023-01-05 08:30:00     True
4   2 2023-01-03 14:55:00 2023-01-03 15:55:00     True
5   1 2023-01-05 06:30:00 2023-01-05 09:30:00     True
6   1 2023-01-07 06:30:00 2023-01-07 09:30:00     True
7   1 2023-01-07 06:00:00 2023-01-07 06:45:00     True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading