Exist a data frame with events
| start_ event | end_event | |
|---|---|---|
| 1 | 2023-09-23 17:20 | 2023-09-23 17:40 |
| 2 | 2023-09-24 10:00 | 2023-09-24 10:20 |
| 3 | 2023-09-24 10:20 | 2023-09-24 10:40 |
| 4 | 2023-09-24 10:40 | 2023-09-24 11:00 |
| 5 | 2023-09-25 11:00 | 2023-09-25 11:20 |
| 6 | 2023-09-25 11:20 | 2023-09-25 11:40 |
Difference between start_ event and end_event always 20 minutes. I need to group the data if a new event starts immediately after the previous one ends
| start_ event | end_event | |
|---|---|---|
| 1 | 2023-09-23 17:20 | 2023-09-23 17:40 |
| 2 | 2023-09-24 10:00 | 2023-09-24 11:00 |
| 3 | 2023-09-25 11:00 | 2023-09-25 11:40 |
the maximum duration of a event is one hour, there may be several events in one day
>Solution :
Try:
# convert start_event/end_event to datetime (if necessary):
df["start_event"] = pd.to_datetime(df["start_event"])
df["end_event"] = pd.to_datetime(df["end_event"])
out = df.groupby(
(df["end_event"].diff() != pd.Timedelta("20 minutes")).cumsum(), as_index=False
).agg({"start_event": "first", "end_event": "last"})
print(out)
Prints:
start_event end_event
0 2023-09-23 17:20:00 2023-09-23 17:40:00
1 2023-09-24 10:00:00 2023-09-24 11:00:00
2 2023-09-25 11:00:00 2023-09-25 11:40:00