Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

keep the NA's in if_else

I have a data frame like this:

ID diagnosis   A1   A2   A3
a       yes    A    A    B
b       yes    B    C    D
c        no <NA>    C <NA>
d        no    E    C    D
e       yes    D <NA>    B

Here A1, A2, and A3 refer to the questions in my test and the letters below represent the answer that participants gave. What I want is to create new columns per question indicating whether the answers are true or not. if it is true I give 1 and if it is not 0. For some questions, I have two right answers. So this is the code I used from dplyr and what I got:

mydf <- mydf %>% mutate(A1.1 = if_else(A1 %in% c("A"), 1, 0))%>% mutate(A2.1 = if_else(A2 %in% c("A", "B"), 1, 0)) %>% mutate(A3.1 = if_else(A3 %in% c("A", "B"), 1, 0))

ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
a       yes    A    A    B    1    1    1
b       yes    B    C    D    0    0    0
c        no <NA>    C <NA>    0    0    0
d        no    E    C    D    0    0    0
e       yes    D <NA>    B    0    0    1

As you can see NA values turned to 0 but I want to keep them as NAs. So, my first question is how can I keep the NAs.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And my second question is whether can you think of any shorter way to make those columns based on answers given to the other columns. Because in my real data I have 30 questions 🙂

Thank you so much!

>Solution :

%in% returns FALSE where there are NAs. We could use ==

library(dplyr)
mydf %>%
   mutate(A1.1 = +(A1 == "A"), A2.1 = +(A2 == "A"|A2 == "B"),
     A3.1 = +(A3 == "A"|A3 == "B") )

-output

 ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

If there are more than one column that uses the same comparison, then use across to loop

 mydf %>%
  mutate(A1.1 = +(A1 == "A"), across(A2:A3,
      ~ +(.x == "A"|.x == "B"), .names = "{.col}.1"))
  ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

data

mydf <- structure(list(ID = c("a", "b", "c", "d", "e"), diagnosis = c("yes", 
"yes", "no", "no", "yes"), A1 = c("A", "B", NA, "E", "D"), A2 = c("A", 
"C", "C", "C", NA), A3 = c("B", "D", NA, "D", "B")),
 class = "data.frame", row.names = c(NA, 
-5L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading