I have the following data
data.orig <- data.frame(
id = c(1,1,1,1,1,1,1, 1),
start = c(0, 47, 103, 194, 277, 350, 453, 522),
end = c(47, 103, 194, 277, 350, 453, 522, 603),
event = c(1, 0, 0, 1, 0, 0, 0, 0)
)
I would like to obtain the following dataset, collapsing the rows that contain the same event value by id, but having repeated values
data.targ <- data.frame(
id = c(1,1,1,1),
start = c(0, 47, 194, 277),
end = c(47, 194, 277, 603),
event = c(1, 0, 1, 0)
)
I have already tried group_by and summarize, but I do not find the correct combination
>Solution :
You are fairly close! The issue with directly using group-by is it is impossible to differentiate same id and event in non-consecutive rows. Therefore, we can add consecutive_id:
library(dplyr)
data.orig %>%
group_by(grp=consecutive_id(id,event),id,event) %>%
reframe(start=min(start,na.rm=TRUE),
end=max(end,na.rm=TRUE)) %>%
select(-grp)
# A tibble: 4 × 4
id event start end
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 47
2 1 0 47 194
3 1 1 194 277
4 1 0 277 603
Alternatively, we can use replace min/max with head/tail:
data.orig %>%
group_by(grp=consecutive_id(id,event),id,event) %>%
reframe(start=head(start,1),
end=tail(end,1)) %>%
select(-grp)
# A tibble: 4 × 4
id event start end
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 47
2 1 0 47 194
3 1 1 194 277
4 1 0 277 603