Conditionally count values on multiple variables in R

Advertisements

I have a df as follows and I would like to count all the "yes"(s)

have = data.frame(x1 = c("yes", "no", NA, "yes", "yes", "yes", NA, "no"),
                  x2 = c("no", "yes", "no", NA, "no", "yes", NA, NA),
                  x3 = c(NA, NA, NA, "yes", "yes", "yes", NA, "yes"),
                  x4 = c("no", "yes", "no", "no", "no", "no", NA, "no"),
                  x5 = c(NA, "no", "no", "no", "no", NA, NA, "no"))

want = data.frame(have,
                  count_yes = c(1, 2, 0, 2, 2, 3, 0, 1))

Here is my attempt!

attempt = as.data.frame(
  have %>% 
    mutate(count_yes_all = str_count(x1, "yes", na.rm=TRUE) +
             str_count(x2, "yes", na.rm=TRUE) + 
             str_count(x3, "yes", na.rm=TRUE) + 
             str_count(x4, "yes", na.rm=TRUE) + 
             str_count(x5, "yes", na.rm=TRUE))
  )

Two things:

  1. How can I deal with NA(s)?
  2. I have over 20 variables that start with "x", rather than having to write the code over 20 lines, how could I write the code more productively?

Many thanks in advance.

>Solution :

With rowSums and na.rm = TRUE to deal with NAs.

If you want to specify your columns (e.g. all columns that starts with "x"), use across instead of ., e.g. across(starts_with("x")), or across(x1:x5).

have %>% 
  mutate(count_yes = rowSums(. == "yes", na.rm = TRUE))

    x1   x2   x3   x4   x5 count_yes
1  yes   no <NA>   no <NA>         1
2   no  yes <NA>  yes   no         2
3 <NA>   no <NA>   no   no         0
4  yes <NA>  yes   no   no         2
5  yes   no  yes   no   no         2
6  yes  yes  yes   no <NA>         3
7 <NA> <NA> <NA> <NA> <NA>         0
8   no <NA>  yes   no   no         1

Leave a ReplyCancel reply