I have column with datetime format
d = {'dttm' : ['2023-01-30 09:02:42','2023-01-30 09:02:47','2023-01-30 09:02:48','2023-01-30 09:02:59',
'2023-01-30 09:10:23','2023-01-30 09:10:27','2023-01-30 09:12:36','2023-01-30 09:14:13',
'2023-01-30 09:14:15','2023-01-30 09:16:12','2023-01-30 09:16:12','2023-01-30 09:16:13']}
dt_data = pd.DataFrame(d)
2023-01-30 09:02:42
2023-01-30 09:02:47
2023-01-30 09:02:48
2023-01-30 09:02:59
2023-01-30 09:10:23
2023-01-30 09:10:27
2023-01-30 09:12:36
2023-01-30 09:14:13
2023-01-30 09:14:15
2023-01-30 09:16:12
2023-01-30 09:16:12
2023-01-30 09:16:13
I need to group them in an interval of 5 seconds.
In other words, if the difference with the date and time in the row below is less than or equal to five seconds, the row is included in the previous group. Otherwise, you have the next group number
Output:
| dttm | group |
|---|---|
| 2023-01-30 09:02:42 | 1 |
| 2023-01-30 09:02:47 | 1 |
| 2023-01-30 09:02:48 | 1 |
| 2023-01-30 09:02:59 | 2 |
| 2023-01-30 09:10:23 | 3 |
| 2023-01-30 09:10:27 | 3 |
| 2023-01-30 09:12:36 | 4 |
| 2023-01-30 09:14:13 | 5 |
| 2023-01-30 09:14:15 | 5 |
| 2023-01-30 09:16:12 | 6 |
| 2023-01-30 09:16:12 | 6 |
| 2023-01-30 09:16:13 | 6 |
>Solution :
convert the "dttm" column with to_datetime, and compute the diff then compare to a reference of "5s" and increment the group with cumsum when the value is above threshold:
dt_data['group'] = pd.to_datetime(dt_data['dttm']).diff().gt('5s').cumsum().add(1)
Output:
dttm group
0 2023-01-30 09:02:42 1
1 2023-01-30 09:02:47 1
2 2023-01-30 09:02:48 1
3 2023-01-30 09:02:59 2
4 2023-01-30 09:10:23 3
5 2023-01-30 09:10:27 3
6 2023-01-30 09:12:36 4
7 2023-01-30 09:14:13 5
8 2023-01-30 09:14:15 5
9 2023-01-30 09:16:12 6
10 2023-01-30 09:16:12 6
11 2023-01-30 09:16:13 6