assigning group ID for pivoting, based on recurrent values in two columns

June 8, 2022

I have a long data frame in which a start and end day are assigned to an action. Some actions might only have a start day, and one action type can start and end multiple times.

I would now like to pivot this wide, so that there is one row for each new start of one action. (see desired output). For this I need to assign a unique ID which clearly identifies the group by which I am pivoting.

I think I could solve this with rle, but I cannot figure out how to use the information of two columns to assign the correct ID for pivoting.

library(tidyverse)
library(data.table)

x <- c("start", "end")
foo <- data.frame(time = c(rep(x, 3), "start", x, "start"), 
           action = rep(letters[1:4], times = c(4,2,3,1)), 
           day = 1:10)

## my approach gets stuck with the second rle 
foo %>%
  mutate(rle_time = rleid(time), 
         rle_action = rleid(action))
#>     time action day rle_time rle_action
#> 1  start      a   1        1          1
#> 2    end      a   2        2          1
#> 3  start      a   3        3          1
#> 4    end      a   4        4          1
#> 5  start      b   5        5          2
#> 6    end      b   6        6          2
#> 7  start      c   7        7          3
#> 8  start      c   8        7          3
#> 9    end      c   9        8          3
#> 10 start      d  10        9          4
## don't know how to use this information to assign the correct ID to the rows 
## so that I can pivot wider correctly. 


## desired output
data.frame(action = rep(letters[1:4], times = c(2,1,2,1)), 
           start = c(seq(1,7,2),8,10), 
           end = c(seq(2,6,2), NA,9,NA)
           )
#>   action start end
#> 1      a     1   2
#> 2      a     3   4
#> 3      b     5   6
#> 4      c     7  NA
#> 5      c     8   9
#> 6      d    10  NA

^{Created on 2022-06-07 by the reprex package (v2.0.1)}

>Solution :

Here, we could use cumsum to create the index column after grouping by ‘action’ and then do reshaping to ‘wide’ format with either pivot_wider (from tidyr) or use data.table::dcast

library(dplyr)
library(tidyr)
foo %>% 
   group_by(action) %>%
   mutate(rn = cumsum(time == 'start')) %>% 
   ungroup %>% 
   pivot_wider(names_from = time, values_from = day) %>% 
   select(-rn)