I have a df as follows and I would like to count all the "yes"(s)
have = data.frame(x1 = c("yes", "no", NA, "yes", "yes", "yes", NA, "no"),
x2 = c("no", "yes", "no", NA, "no", "yes", NA, NA),
x3 = c(NA, NA, NA, "yes", "yes", "yes", NA, "yes"),
x4 = c("no", "yes", "no", "no", "no", "no", NA, "no"),
x5 = c(NA, "no", "no", "no", "no", NA, NA, "no"))
want = data.frame(have,
count_yes = c(1, 2, 0, 2, 2, 3, 0, 1))
Here is my attempt!
attempt = as.data.frame(
have %>%
mutate(count_yes_all = str_count(x1, "yes", na.rm=TRUE) +
str_count(x2, "yes", na.rm=TRUE) +
str_count(x3, "yes", na.rm=TRUE) +
str_count(x4, "yes", na.rm=TRUE) +
str_count(x5, "yes", na.rm=TRUE))
)
Two things:
- How can I deal with NA(s)?
- I have over 20 variables that start with "x", rather than having to write the code over 20 lines, how could I write the code more productively?
Many thanks in advance.
>Solution :
With rowSums
and na.rm = TRUE
to deal with NAs.
If you want to specify your columns (e.g. all columns that starts with "x"), use across
instead of .
, e.g. across(starts_with("x"))
, or across(x1:x5)
.
have %>%
mutate(count_yes = rowSums(. == "yes", na.rm = TRUE))
x1 x2 x3 x4 x5 count_yes
1 yes no <NA> no <NA> 1
2 no yes <NA> yes no 2
3 <NA> no <NA> no no 0
4 yes <NA> yes no no 2
5 yes no yes no no 2
6 yes yes yes no <NA> 3
7 <NA> <NA> <NA> <NA> <NA> 0
8 no <NA> yes no no 1