Home Creating a new variable by referring to the previous value, based on a condition from two other variables

Questions

Creating a new variable by referring to the previous value, based on a condition from two other variables

March 12, 2023

I have been trying to create a variable based on multiple conditions from other variables, while referencing its previous value. Unfortunately nothing I tried seemed to work.
Any help would be greatly appreciated!

I have a dataframe like this:

df <- data.frame(
  ID = (rep(c(1, 2, 3), times = c(3, 6, 4))),
  threshold = c(NA, 2, 6, 
                NA, 2, 3, 7, 3, 7,
                NA, 7, 7, 2)
)

I am trying to create a new variable new_var in a way that it assigns the number 1 to the first row, and keeps assigning the same number until the value of threshold is >= than 5. When this happens, the value of new_var should increase by one, and stay like that until the next time that threshold is larger or equal to 5. Additionally, this rule should reset for every participant, so that the first entry for each participant starts at 1.
This is how the current example should look like:

df <- data.frame(
  ID = (rep(c(1, 2, 3), times = c(3, 6, 4))),
  threshold = c(NA, 2, 6, 
                NA, 2, 3, 7, 3, 7,
                NA, 7, 7, 2)
  
)

enter image description here

I have tried grouping by ID, and using case_when to define the different options based on threshold. It is worth mentioning that the value of threshold is always missing for the first row of each participant.
I have used this code:

library(tidyverse)

df1 <- df %>%
  group_by(ID) %>%
  mutate(
    new_var = NA, #first had to create an empty variable so later I can refer to it
    new_var = case_when(
      is.na(threshold) == TRUE ~ 1,
      threshold < 5 ~ new_var[-1],
      threshold >= 5 ~ new_var[-1] + 1
    )
  ) %>%
  ungroup()

However, I keep getting this error:

Error in mutate():
! Problem while computing new_var = case_when(...).
ℹ The error occurred in group 1: ID = 1.
Caused by error in case_when():
! threshold < 5 ~ new_var[-1], threshold >= 5 ~ new_var[-1] + 1 must be length 3 or one, not 2.
Backtrace:

… %>% ungroup()
dplyr::case_when(…)

So according to my understanding, the problem is that when I try to refer to the previous value of new_var, the program treats it as a whole vector instead of one particular data point separately at the each row. But maybe I’m wrong…
Is there a better way to refer to the previous value of a vector than by new_var[-1]?
Or perhaps a better approach to solving this whole puzzle?
I would be grateful to hear any insight!

Thank you!

>Solution :

You could try:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(new_var = cumsum(coalesce(threshold, 0L) >= 5) + 1) %>%
  ungroup

Output:

# A tibble: 13 × 3
      ID threshold new_var
   <dbl>     <dbl>   <dbl>
 1     1        NA       1
 2     1         2       1
 3     1         6       2
 4     2        NA       1
 5     2         2       1
 6     2         3       1
 7     2         7       2
 8     2         3       2
 9     2         7       3
10     3        NA       1
11     3         7       2
12     3         7       3
13     3         2       3

Which could be translated into this one-liner if you’re using the latest version of dplyr (1.1.0.):

mutate(df, new_var = cumsum(coalesce(threshold, 0L) >= 5) + 1, .by = ID)

data-wrangling

byMR

Published March 12, 2023

Add a comment

Bash command on a variable

byMR

March 13, 2023

Questions

from where did the self.request come from?

byMR

March 13, 2023

Questions

Pandas DataFrame: .replace() and .strip() methods returning NaN values

byMR

March 13, 2023

Questions

Databases in SQL Server

byMR

March 13, 2023

Questions

Copy protobuf field to another protobuf in C++

byMR

March 13, 2023

Questions

I want to use JavaScript async/await

byMR

March 13, 2023

Creating a new variable by referring to the previous value, based on a condition from two other variables

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Bash command on a variable

from where did the self.request come from?

Pandas DataFrame: .replace() and .strip() methods returning NaN values

Databases in SQL Server

Copy protobuf field to another protobuf in C++

I want to use JavaScript async/await

Keep Up to Date with the Most Important News

Creating a new variable by referring to the previous value, based on a condition from two other variables

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Bash command on a variable

from where did the self.request come from?

Pandas DataFrame: .replace() and .strip() methods returning NaN values

Databases in SQL Server

Copy protobuf field to another protobuf in C++

I want to use JavaScript async/await

Discover more from Dev solutions