I have a data frame like this:
ID diagnosis A1 A2 A3
a yes A A B
b yes B C D
c no <NA> C <NA>
d no E C D
e yes D <NA> B
Here A1, A2, and A3 refer to the questions in my test and the letters below represent the answer that participants gave. What I want is to create new columns per question indicating whether the answers are true or not. if it is true I give 1 and if it is not 0. For some questions, I have two right answers. So this is the code I used from dplyr and what I got:
mydf <- mydf %>% mutate(A1.1 = if_else(A1 %in% c("A"), 1, 0))%>% mutate(A2.1 = if_else(A2 %in% c("A", "B"), 1, 0)) %>% mutate(A3.1 = if_else(A3 %in% c("A", "B"), 1, 0))
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
a yes A A B 1 1 1
b yes B C D 0 0 0
c no <NA> C <NA> 0 0 0
d no E C D 0 0 0
e yes D <NA> B 0 0 1
As you can see NA values turned to 0 but I want to keep them as NAs. So, my first question is how can I keep the NAs.
And my second question is whether can you think of any shorter way to make those columns based on answers given to the other columns. Because in my real data I have 30 questions 🙂
Thank you so much!
>Solution :
%in% returns FALSE where there are NAs. We could use ==
library(dplyr)
mydf %>%
mutate(A1.1 = +(A1 == "A"), A2.1 = +(A2 == "A"|A2 == "B"),
A3.1 = +(A3 == "A"|A3 == "B") )
-output
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
1 a yes A A B 1 1 1
2 b yes B C D 0 0 0
3 c no <NA> C <NA> NA 0 NA
4 d no E C D 0 0 0
5 e yes D <NA> B 0 NA 1
If there are more than one column that uses the same comparison, then use across to loop
mydf %>%
mutate(A1.1 = +(A1 == "A"), across(A2:A3,
~ +(.x == "A"|.x == "B"), .names = "{.col}.1"))
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
1 a yes A A B 1 1 1
2 b yes B C D 0 0 0
3 c no <NA> C <NA> NA 0 NA
4 d no E C D 0 0 0
5 e yes D <NA> B 0 NA 1
data
mydf <- structure(list(ID = c("a", "b", "c", "d", "e"), diagnosis = c("yes",
"yes", "no", "no", "yes"), A1 = c("A", "B", NA, "E", "D"), A2 = c("A",
"C", "C", "C", NA), A3 = c("B", "D", NA, "D", "B")),
class = "data.frame", row.names = c(NA,
-5L))