Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating a new variable by referring to the previous value, based on a condition from two other variables

I have been trying to create a variable based on multiple conditions from other variables, while referencing its previous value. Unfortunately nothing I tried seemed to work.
Any help would be greatly appreciated!

I have a dataframe like this:

df <- data.frame(
  ID = (rep(c(1, 2, 3), times = c(3, 6, 4))),
  threshold = c(NA, 2, 6, 
                NA, 2, 3, 7, 3, 7,
                NA, 7, 7, 2)
)

I am trying to create a new variable new_var in a way that it assigns the number 1 to the first row, and keeps assigning the same number until the value of threshold is >= than 5. When this happens, the value of new_var should increase by one, and stay like that until the next time that threshold is larger or equal to 5. Additionally, this rule should reset for every participant, so that the first entry for each participant starts at 1.
This is how the current example should look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df <- data.frame(
  ID = (rep(c(1, 2, 3), times = c(3, 6, 4))),
  threshold = c(NA, 2, 6, 
                NA, 2, 3, 7, 3, 7,
                NA, 7, 7, 2)
  
)

enter image description here

I have tried grouping by ID, and using case_when to define the different options based on threshold. It is worth mentioning that the value of threshold is always missing for the first row of each participant.
I have used this code:

library(tidyverse)

df1 <- df %>%
  group_by(ID) %>%
  mutate(
    new_var = NA, #first had to create an empty variable so later I can refer to it
    new_var = case_when(
      is.na(threshold) == TRUE ~ 1,
      threshold < 5 ~ new_var[-1],
      threshold >= 5 ~ new_var[-1] + 1
    )
  ) %>%
  ungroup()
 

However, I keep getting this error:

Error in mutate():
! Problem while computing new_var = case_when(...).
ℹ The error occurred in group 1: ID = 1.
Caused by error in case_when():
! threshold < 5 ~ new_var[-1], threshold >= 5 ~ new_var[-1] + 1 must be length 3 or one, not 2.
Backtrace:

  1. … %>% ungroup()
  2. dplyr::case_when(…)

So according to my understanding, the problem is that when I try to refer to the previous value of new_var, the program treats it as a whole vector instead of one particular data point separately at the each row. But maybe I’m wrong…
Is there a better way to refer to the previous value of a vector than by new_var[-1]?
Or perhaps a better approach to solving this whole puzzle?
I would be grateful to hear any insight!

Thank you!

>Solution :

You could try:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(new_var = cumsum(coalesce(threshold, 0L) >= 5) + 1) %>%
  ungroup

Output:

# A tibble: 13 × 3
      ID threshold new_var
   <dbl>     <dbl>   <dbl>
 1     1        NA       1
 2     1         2       1
 3     1         6       2
 4     2        NA       1
 5     2         2       1
 6     2         3       1
 7     2         7       2
 8     2         3       2
 9     2         7       3
10     3        NA       1
11     3         7       2
12     3         7       3
13     3         2       3

Which could be translated into this one-liner if you’re using the latest version of dplyr (1.1.0.):

mutate(df, new_var = cumsum(coalesce(threshold, 0L) >= 5) + 1, .by = ID)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading