Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating a new column based on values obtained from different column, using mutate() and case_when function in R

I am a student relatively new to R and have learnt a lot from browsing here, I have been stuck on something recently which after hours of trying still haven’t been able to figure out what to do. Let’s propose the following data set:

ID Y1 Y2 Y3 Y4

1 0 0 1 1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2 0 0 0 0

3 NA NA NA NA

I want to create a new column where it is filled based upon the following the conditions:

  1. If the row contains 1, return 1 regardless of NA or 0
  2. If it contains a mix of 0 and NA but not 1, return 0
  3. If it only contains NA, return NA

So using the example above I wanted to get the following:

ID Y1 Y2 Y3 Y4 Outcome

1 0 0 1 1 1

2 0 0 0 0 0

3 NA NA NA NA NA

However, the code I tried:

Data2 <- Data %>% mutate(Outcome = case_when( 
                                Data$Y1 == "na" &
                                Data$Y2 == "na" &
                                Data$Y3 == "na" &
                                Data$Y4 == "na" ~ "na"))  %>%                                
          mutate(Outcome = case_when(Data$Y1 == 1 ~ "1", 
                                 Data$Y2 == 1 ~ "1", 
                                 Data$Y3 == 1 ~ "1",
                                 Data$Y4 == 1 ~ "1",
                                 TRUE ~ "No"))

will return with:

ID Y1 Y2 Y3 Y4 Outcome

1 0 0 1 1 1

2 0 0 0 0 0

3 NA NA NA NA 0

which seems to ignore condition 3 where if it only contains na, return na.

Any pointers as to what I done wrong would be greatly appreciated.

Please forgive the formatting, I’m not sure how I could make it prettier as this is the first time I asked a question here.

Many thanks in advance!

[Edit] Thanks to Shah I noticed that there is potential for confusion, for that I apologise. I need give some clarification that this is just a segment of the data set to get the point across. I’m dealing with a big dataset which contains more columns, some of which also have numeric values.

>Solution :

You can try this using dplyr rowwise function which treat each row separately

library(dplyr)

df |> rowwise() |> 
mutate(Outcome = case_when(any(c_across(Y1:Y4) == 1) ~ "1" ,
 all(is.na(c_across(Y1:Y4))) ~ NA_character_ , TRUE ~ "0"))

  • output
# A tibble: 3 × 6
# Rowwise: 
     ID    Y1    Y2    Y3    Y4 Outcome
  <int> <int> <int> <int> <int> <chr>  
1     1     0     0     1     1 1      
2     2     0     0     0     0 0      
3     3    NA    NA    NA    NA NA     
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading