Collapse rows by columns in R repeated values

July 5, 2023

I have the following data

data.orig <- data.frame(
          id = c(1,1,1,1,1,1,1, 1),
          start = c(0, 47, 103, 194, 277, 350, 453, 522),
          end = c(47, 103, 194, 277, 350, 453, 522, 603),
          event = c(1, 0, 0, 1, 0, 0, 0, 0)
    )

I would like to obtain the following dataset, collapsing the rows that contain the same event value by id, but having repeated values

data.targ <- data.frame(
          id = c(1,1,1,1),
          start = c(0, 47, 194, 277),
          end = c(47, 194, 277, 603),
          event = c(1, 0, 1, 0)
    )

I have already tried group_by and summarize, but I do not find the correct combination

>Solution :

You are fairly close! The issue with directly using group-by is it is impossible to differentiate same id and event in non-consecutive rows. Therefore, we can add consecutive_id:

library(dplyr)
data.orig %>%
  group_by(grp=consecutive_id(id,event),id,event) %>%
  reframe(start=min(start,na.rm=TRUE),
            end=max(end,na.rm=TRUE)) %>%
  select(-grp)

# A tibble: 4 × 4
     id event start   end
  <dbl> <dbl> <dbl> <dbl>
1     1     1     0    47
2     1     0    47   194
3     1     1   194   277
4     1     0   277   603

Alternatively, we can use replace min/max with head/tail:

data.orig %>%
  group_by(grp=consecutive_id(id,event),id,event) %>%
  reframe(start=head(start,1),
          end=tail(end,1)) %>%
  select(-grp)

# A tibble: 4 × 4
     id event start   end
  <dbl> <dbl> <dbl> <dbl>
1     1     1     0    47
2     1     0    47   194
3     1     1   194   277
4     1     0   277   603