How can I measure if there is overlap in begin time to end time within each group using Python?

Advertisements

How can I see if there is overlap between start time and/or end time for each group (by ID). That is to say if two "services" occurred together for any length of time from one employee (ID). I have a table like the following, but would like to calculate the overlap column.

| ID |   Begin Time   |    End Time    | Overlap |
| 1  | 1/1/2023 13:30 | 1/1/2023 13:55 |  False  |
| 1  | 1/7/2023 12:30 | 1/1/2023 13:45 |  False  |
| 2  | 1/3/2023 15:30 | 1/3/2023 16:30 |   True  |
| 1  | 1/5/2023 07:30 | 1/5/2023 08:30 |   True  |
| 2  | 1/3/2023 14:55 | 1/3/2023 15:55 |   True  |
| 1  | 1/5/2023 06:30 | 1/5/2023 09:30 |   True  |
| 1  | 1/7/2023 06:30 | 1/7/2023 09:30 |   True  |
| 1  | 1/7/2023 06:00 | 1/7/2023 06:45 |   True  |

Here is a chunk of code that creates this dataframe –>

id_list = [1,1,2,1,2,1,1,1]
begin_time = ['1/1/2023 13:30', '1/7/2023 12:30', '1/3/2023 15:30', '1/5/2023 07:30', '1/3/2023 14:55', 
             '1/5/2023 06:30', '1/7/2023 06:30', '1/7/2023 06:00']
end_time = ['1/1/2023 13:55', '1/1/2023 13:45', '1/3/2023 16:30', '1/5/2023 08:30', '1/3/2023 15:55',
           '1/5/2023 09:30', '1/7/2023 09:30', '1/7/2023 06:45']
df = pd.DataFrame(list(zip(id_list, begin_time, end_time)), columns = ['ID', 'Begin_Time', 'End_Time'])
df['Begin_Time'] = pd.to_datetime(df['Begin_Time'])
df['End_Time'] = pd.to_datetime(df['End_Time'])
df

>Solution :

Use Interval.overlaps in custom function with enumerate for filter out itself Interval:

def f(x):
    i = pd.IntervalIndex.from_arrays(x['Begin_Time'],
                                     x['End_Time'], 
                                 closed="both")
    a = np.arange(len(x))
    x['overlap'] = [i[a != j].overlaps(y).any() for j, y in enumerate(i) ]
    return x

df = df.groupby('ID').apply(f)
print (df)
   ID          Begin_Time            End_Time  overlap
0   1 2023-01-01 13:30:00 2023-01-01 13:55:00    False
1   1 2023-01-08 12:30:00 2023-01-08 13:45:00    False <- data was changed
2   2 2023-01-03 15:30:00 2023-01-03 16:30:00     True
3   1 2023-01-05 07:30:00 2023-01-05 08:30:00     True
4   2 2023-01-03 14:55:00 2023-01-03 15:55:00     True
5   1 2023-01-05 06:30:00 2023-01-05 09:30:00     True
6   1 2023-01-07 06:30:00 2023-01-07 09:30:00     True
7   1 2023-01-07 06:00:00 2023-01-07 06:45:00     True

Leave a Reply Cancel reply