Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Relative row index based on a variable value in R

wondering if there was an efficient and terse/concise data.table solution to the following problem.

Please suppose that I have the following data.table:

library(data.table)

DT <- data.table(store = c("A", "A", "A", "A", "B", "B", "B", "B"),
                 time = c(1,2,3,4,1,2,3,4),
                 treat_time = c(0,0,1,0, 0,1,0,0))

Here, treat_time is the time period where the store receives a treatment. Note that the time of the treatment is different between stores A and stores B. I would like to create a column time_rel, which describes the time period relative to when treat_time = 1. That is, the data.table should look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


DT_outcome <- data.table(store = c("A", "A", "A", "A", "B", "B", "B", "B"),
                 time = c(1,2,3,4,1,2,3,4),
                 treat_time = c(0,0,1,0, 0,1,0,0),
                 time_rel = c(-1,0,1,2, 0,1,2,3))


    store  time treat_time time_rel
   <char> <num>      <num>    <num>
1:      A     1          0       -1
2:      A     2          0        0
3:      A     3          1        1
4:      A     4          0        2
5:      B     1          0        0
6:      B     2          1        1
7:      B     3          0        2
8:      B     4          0        3

Thanks!

>Solution :

We can subtract from the sequence

library(data.table)
DT[,   time_rel := seq_len(.N) - seq_len(.N)[treat_time == 1] + 1 ,store]

-output

> DT
    store  time treat_time time_rel
   <char> <num>      <num>    <num>
1:      A     1          0       -1
2:      A     2          0        0
3:      A     3          1        1
4:      A     4          0        2
5:      B     1          0        0
6:      B     2          1        1
7:      B     3          0        2
8:      B     4          0        3

Or the same logic in dplyr

library(dplyr)
DT %>% 
  group_by(store) %>%
  mutate(time_rel = row_number() - which(treat_time == 1) + 1) %>% 
  ungroup
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading