let’s imagine I have dataframe like this
| event | time | type |
|---|---|---|
| 1 | 2022-07-15 18:08:05 | def |
| 2 | 2022-07-15 18:08:06 | att |
| 3 | 2022-07-15 18:09:00 | def |
| 4 | 2022-07-15 18:09:12 | def |
| 5 | 2022-07-15 18:13:26 | def |
| 6 | 2022-07-15 18:13:36 | att |
| 7 | 2022-07-15 18:19:05 | def |
| 8 | 2022-07-15 18:21:43 | def |
| 9 | 2022-07-15 18:26:06 | att |
| 10 | 2022-07-15 18:27:26 | def |
I want to leave only rows with type ‘att’ that occurred within 15 sec after rows with type ‘def’.
So desired output is:
| event | time | type |
|---|---|---|
| 2 | 2022-07-15 18:08:06 | att |
| 6 | 2022-07-15 18:13:36 | att |
Could you help me please to figure it out?
>Solution :
You could use dplyr::filter() and dplyr::lag() here. Assuming your time variable is in datetime format:
library(dplyr)
df %>%
filter(type == "att",
(time - dplyr::lag(time)) < 15, # could also use `(difftime(time, lag(time), units = "secs")) < 15`
lag(type) == "def")
# event time type
# 1 2 2022-07-15 18:08:06 att
# 2 6 2022-07-15 18:13:36 att
Data:
df <- read.table(text = "event time type
1 '2022-07-15 18:08:05' def
2 '2022-07-15 18:08:06' att
3 '2022-07-15 18:09:00' def
4 '2022-07-15 18:09:12' def
5 '2022-07-15 18:13:26' def
6 '2022-07-15 18:13:36' att
7 '2022-07-15 18:19:05' def
8 '2022-07-15 18:21:43' def
9 '2022-07-15 18:26:06' att
10 '2022-07-15 18:27:26' def", h = TRUE)
df$time <- lubridate::as_datetime(df$time)