I have dataframe like this
data = {'event': [1, 1, 3, 2, 3, 1, 2, 3, 4, 5], 'team': ['A', 'B', 'B', 'A', 'B', 'C', 'D', 'C', 'D', 'D'], 'timeStamp': ['2023-07-23 14:57:13.357', '2023-07-23 14:57:14.357', '2023-07-23 14:57:15.357', '2023-07-23 14:57:16.357', '2023-07-23 14:57:20.357', '2023-07-23 14:57:13.357', '2023-07-23 14:57:18.357', '2023-07-23 14:57:23.357', '2023-07-23 14:57:23.357', '2023-07-23 14:57:25.357']}
I need to identify is there row with event "2" in 7 seconds after each row for each team.
Output should be like this
| event | team | timeStamp | is_2_in_7_sec |
|---|---|---|---|
| 1 | A | 2023-07-23 14:57:13.357 | TRUE |
| 1 | A | 2023-07-23 14:57:14.357 | TRUE |
| 3 | B | 2023-07-23 14:57:15.357 | FALSE |
| 2 | A | 2023-07-23 14:57:16.357 | FALSE |
| 3 | B | 2023-07-23 14:57:20.357 | FALSE |
| 1 | C | 2023-07-23 14:57:13.357 | TRUE |
| 2 | C | 2023-07-23 14:57:18.357 | FALSE |
| 3 | C | 2023-07-23 14:57:23.357 | FALSE |
| 4 | D | 2023-07-23 14:57:23.357 | FALSE |
| 5 | D | 2023-07-23 14:57:25.357 | FALSE |
In R it’s quite easy, but in Python I’m stuck at this task.
>Solution :
You can use a groupby.rolling max after setting up a boolean column, then merge to the original DataFrame:
out = df.merge(df.assign(is_2_in_7_sec=df['event'].eq(2))[::-1]
.groupby(df['team'])
.rolling('7s', on='timeStamp')
['is_2_in_7_sec'].max().astype(bool)
.reset_index()
)
Output:
event team timeStamp is_2_in_7_sec
0 1 A 2023-07-23 14:57:13.357 True
1 1 A 2023-07-23 14:57:14.357 True
2 3 B 2023-07-23 14:57:15.357 False
3 2 A 2023-07-23 14:57:16.357 True
4 3 B 2023-07-23 14:57:20.357 False
5 1 C 2023-07-23 14:57:13.357 False
6 2 D 2023-07-23 14:57:18.357 True
7 3 C 2023-07-23 14:57:23.357 False
8 4 D 2023-07-23 14:57:23.357 False
9 5 D 2023-07-23 14:57:25.357 False