Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to reset a sequence in a R dataframe's column, based on the levels of a different column

I have data that logs subject and session information in columns like so:

> Subject <- c(rep("A", 6), rep("B", 7))
> Session <- c(rep(1, 3), rep(2, 3), rep(8, 4), rep(9, 3))
> df <- data.frame(Subject, Session)
> df
   Subject Session
1        A       1
2        A       1
3        A       1
4        A       2
5        A       2
6        A       2
7        B       8
8        B       8
9        B       8
10       B       8
11       B       9
12       B       9
13       B       9

The problem is that the raw data does not always have consistent session numbering (hence why subject B’s first session starts at session 8 and not 1). Ideally, it would be helpful to have a column that resets the session numbering when each new subject appears. Like this:

   Subject Session New_Sess
1        A       1        1
2        A       1        1
3        A       1        1
4        A       2        2
5        A       2        2
6        A       2        2
7        B       8        1
8        B       8        1
9        B       8        1
10       B       8        1
11       B       9        2
12       B       9        2
13       B       9        2

I’ve attempted a solution with nested for loops and could not manage to get that to work. I suspect there must be a straightforward solution to this (it doesn’t seem like it should be that difficult) but despite my best efforts, it has evaded me completely.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The actual dataset is quite large and cumbersome, so an automated solution to this problem would be helpful.

>Solution :

We may use match

library(dplyr)
df <- df %>%
   group_by(Subject) %>% 
   mutate(New_Sess = match(Session, unique(Session))) %>%
   ungroup

-output

df
# A tibble: 13 × 3
   Subject Session New_Sess
   <chr>     <dbl>    <int>
 1 A             1        1
 2 A             1        1
 3 A             1        1
 4 A             2        2
 5 A             2        2
 6 A             2        2
 7 B             8        1
 8 B             8        1
 9 B             8        1
10 B             8        1
11 B             9        2
12 B             9        2
13 B             9        2

Or with factor

df <-  df %>%
     group_by(Subject) %>% 
     mutate(New_Sess = as.integer(factor(Session))) %>% 
     ungroup
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading