keep the NA's in if_else

September 12, 2022

I have a data frame like this:

ID diagnosis   A1   A2   A3
a       yes    A    A    B
b       yes    B    C    D
c        no <NA>    C <NA>
d        no    E    C    D
e       yes    D <NA>    B

Here A1, A2, and A3 refer to the questions in my test and the letters below represent the answer that participants gave. What I want is to create new columns per question indicating whether the answers are true or not. if it is true I give 1 and if it is not 0. For some questions, I have two right answers. So this is the code I used from dplyr and what I got:

mydf <- mydf %>% mutate(A1.1 = if_else(A1 %in% c("A"), 1, 0))%>% mutate(A2.1 = if_else(A2 %in% c("A", "B"), 1, 0)) %>% mutate(A3.1 = if_else(A3 %in% c("A", "B"), 1, 0))

ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
a       yes    A    A    B    1    1    1
b       yes    B    C    D    0    0    0
c        no <NA>    C <NA>    0    0    0
d        no    E    C    D    0    0    0
e       yes    D <NA>    B    0    0    1

As you can see NA values turned to 0 but I want to keep them as NAs. So, my first question is how can I keep the NAs.

And my second question is whether can you think of any shorter way to make those columns based on answers given to the other columns. Because in my real data I have 30 questions 🙂

Thank you so much!

>Solution :

%in% returns FALSE where there are NAs. We could use ==

library(dplyr)
mydf %>%
   mutate(A1.1 = +(A1 == "A"), A2.1 = +(A2 == "A"|A2 == "B"),
     A3.1 = +(A3 == "A"|A3 == "B") )

-output

 ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

If there are more than one column that uses the same comparison, then use across to loop

 mydf %>%
  mutate(A1.1 = +(A1 == "A"), across(A2:A3,
      ~ +(.x == "A"|.x == "B"), .names = "{.col}.1"))
  ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

data

mydf <- structure(list(ID = c("a", "b", "c", "d", "e"), diagnosis = c("yes", 
"yes", "no", "no", "yes"), A1 = c("A", "B", NA, "E", "D"), A2 = c("A", 
"C", "C", "C", NA), A3 = c("B", "D", NA, "D", "B")),
 class = "data.frame", row.names = c(NA, 
-5L))