Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

case_when doesn't work with multiple conditions over multiple variables

I just discovered that, case_when might not work if a variable is recoded based on multiple variables.

Reproducible data:

data <- data.frame(f103 = c(2, NA, NA, 1, 2, 2),
                       f76 = c(2, NA, NA, NA, 3, 3),
                       f4 = c(1,3,3,1,1,2))

The following code produces the same results for var1 and var 2 (which is not what I want):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

reprdata <- reprdata %>%
  mutate(var1 = f4) %>% 
  mutate(var1 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var1))) %>%
  mutate(var2 = f4) %>% 
  mutate(var2 = case_when(f103 == 2 ~ 3, f76 == 1 ~ 1, f76 == 2 ~ 2, f76 == 3 ~ 3, TRUE ~ as.numeric(var2)))

The following produces the correct result (i.e., the solution to my problem):

reprdata <- reprdata %>%
  mutate(var1 = f4) %>% 
  mutate(var1 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var1))) %>%
  mutate(var2 = f4) %>% 
  mutate(var2 = case_when(f103 == 2 ~ 3, TRUE ~ as.numeric(var2))) %>%
  mutate(var2 = case_when(f76 == 1 ~ 1, f76 == 2 ~ 2, f76 == 3 ~ 3, TRUE ~ as.numeric(var2)))

(I am aware that in this snippet of my data, the f103 condition for var1 is superfluous, still, I wouldn’t expect it to cause this issue.)

I’d be interested to know if someone can explain to my why this problem occurs and how to prevent it in future.

>Solution :

It has to do with how case_when evaluates: It’s evaluating from the bottom and up, which is contrary to what most people think intuitively (my experience). I.e.

f76 wins (what you expect!)

library(dplyr)

data |>
    mutate(var1 = case_when(f103 == 2 ~ 3,
                            TRUE ~ f4)) |>
    mutate(var2 = case_when(f76 %in% 1:3 ~ f76,
                            f103 == 2 ~ 3, # NB!
                            TRUE ~ f4))
  f103 f76 f4 var1 var2
1    2   2  1    3    2
2   NA  NA  3    3    3
3   NA  NA  3    3    3
4    1  NA  1    1    1
5    2   3  1    3    3
6    2   3  2    3    3

f103 wins (what you don’t expect)

library(dplyr)

data |>
    mutate(var1 = case_when(f103 == 2 ~ 3,
                            TRUE ~ f4)) |>
    mutate(var2 = case_when(f103 == 2 ~ 3, # NB!
                            f76 %in% 1:3 ~ f76 
                            TRUE ~ f4))
  f103 f76 f4 var1 var2
1    2   2  1    3    3
2   NA  NA  3    3    3
3   NA  NA  3    3    3
4    1  NA  1    1    1
5    2   3  1    3    3
6    2   3  2    3    3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading