Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

assigning group ID for pivoting, based on recurrent values in two columns

I have a long data frame in which a start and end day are assigned to an action. Some actions might only have a start day, and one action type can start and end multiple times.

I would now like to pivot this wide, so that there is one row for each new start of one action. (see desired output). For this I need to assign a unique ID which clearly identifies the group by which I am pivoting.

I think I could solve this with rle, but I cannot figure out how to use the information of two columns to assign the correct ID for pivoting.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(tidyverse)
library(data.table)

x <- c("start", "end")
foo <- data.frame(time = c(rep(x, 3), "start", x, "start"), 
           action = rep(letters[1:4], times = c(4,2,3,1)), 
           day = 1:10)

## my approach gets stuck with the second rle 
foo %>%
  mutate(rle_time = rleid(time), 
         rle_action = rleid(action))
#>     time action day rle_time rle_action
#> 1  start      a   1        1          1
#> 2    end      a   2        2          1
#> 3  start      a   3        3          1
#> 4    end      a   4        4          1
#> 5  start      b   5        5          2
#> 6    end      b   6        6          2
#> 7  start      c   7        7          3
#> 8  start      c   8        7          3
#> 9    end      c   9        8          3
#> 10 start      d  10        9          4
## don't know how to use this information to assign the correct ID to the rows 
## so that I can pivot wider correctly. 


## desired output
data.frame(action = rep(letters[1:4], times = c(2,1,2,1)), 
           start = c(seq(1,7,2),8,10), 
           end = c(seq(2,6,2), NA,9,NA)
           )
#>   action start end
#> 1      a     1   2
#> 2      a     3   4
#> 3      b     5   6
#> 4      c     7  NA
#> 5      c     8   9
#> 6      d    10  NA

Created on 2022-06-07 by the reprex package (v2.0.1)

>Solution :

Here, we could use cumsum to create the index column after grouping by ‘action’ and then do reshaping to ‘wide’ format with either pivot_wider (from tidyr) or use data.table::dcast

library(dplyr)
library(tidyr)
foo %>% 
   group_by(action) %>%
   mutate(rn = cumsum(time == 'start')) %>% 
   ungroup %>% 
   pivot_wider(names_from = time, values_from = day) %>% 
   select(-rn)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading