Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Difficulty using case_when() to add column that, conditionally, pastes value from another column

I’m trying to use case_when() and mutate() for data wrangling, but I’m running into difficulty: I want to add a new column (new_var) such that if var_1 contains "A" then new_var will contain "A". But if var_1 contains "B" or (real) NAs (indeed anything other than "A") then new_var will become whatever value is found in var_2.

df <- tibble(var_1=c('A','B','A',NA,'B','B',NA,'B'), 
var_2=c('MM','QQ','ZZ','GG','MM','QQ','ZZ','GG'))

 A tibble: 8 Ă— 2
  var_1 var_2
  <chr> <chr>
1 A     MM   
2 B     QQ   
3 A     ZZ   
4 NA    GG   
5 B     MM   
6 B     QQ   
7 NA    ZZ   
8 B     GG   

My desired outcome is this:

# A tibble: 8 Ă— 3
  var_1 var_2 new_var
  <chr> <chr> <chr>  
1 A     MM    A      
2 B     QQ    QQ     
3 A     ZZ    A      
4 NA    GG    GG     
5 B     MM    MM     
6 B     QQ    QQ     
7 NA    ZZ    ZZ     
8 B     GG    GG  

I’ve reviewed numerous questions involving case_when() and mutate(), but none address how to achieve new_var becoming whatever value is found in var_2. This is my first attempt to use case_when().

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve got the general approach to creating the new column, but then I’m stuck:

library(dplyr)

df %>%
  mutate(new_var = case_when(var_1 == 'A' ~ 'A',
         is.na(var_1) ~ var_2,  ##obv ~var_2 doesn't work
         var_1 == 'B' ~ var_2),
         TRUE = 'Unknown')

Because I hope to include this in a longer pipe, I’d prefer a solution using dplyr. Thanks in advance!

>Solution :

You have syntax errors above, including unbalanced parentheses, and also the final TRUE condition in case_when() should also use tilde. Making all these changes, it seems to be working:

df <- df %>%
    mutate(new_var = case_when(
        var_1 == "A" ~ "A",
        is.na(var_1) ~ var_2,
        var_1 == "B" ~ var_2,
        TRUE ~ "Unknown")
    )

df

# A tibble: 8 x 3
  var_1 var_2 new_var
  <chr> <chr> <chr>
1 A     MM    A
2 B     QQ    QQ
3 A     ZZ    A
4 <NA>  GG    GG
5 B     MM    MM
6 B     QQ    QQ
7 <NA>  ZZ    ZZ
8 B     GG    GG
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading