Let’s say I work in a psychological context and I’m wondering how many risk factors a patient has. After that, I would like to list all the risks and then discover the most prevalent risk (mode). I’m thinking on use mutate and then paste0 and get the colname if the value of the row is "risk". However, I’m having a hard time with that.
any help is appreaciated.
Code is below:
library(tidyverse)
df = data.frame(
patient = seq(1:60),
cancer = c("risk","ok"),
blood_pres = c("risk", "ok"),
low_education = c("risk","ok")
)
df = df %>% mutate(how_many_risks =
rowSums(. == "risk"))
>Solution :
The c_across() function is what you are missing. Using your example data:
risk_factors <- c('cancer', 'blood_pres', 'low_education')
df <- df %>%
rowwise() %>%
mutate(how_many_risks = sum(c_across(all_of(risk_factors)) == "risk"),
what_risks = paste0(risk_factors[which(c_across(all_of(risk_factors)) == "risk")], collapse = ";")) %>%
ungroup()
You could add an extra line of logic to report the empty cases as ‘none’ (as in your example) with:
df2 <- df %>%
mutate(what_risks = if_else(what_risks == "", "none", what_risks))
based on the OP’s comments, assuming the variables all begin with "risk" and no longer requiring the name vector:
df <- df %>%
rowwise() %>%
mutate(how_many_risks = sum(c_across(starts_with("risk")) == "risk"),
what_risks = paste0(colnames(.)[which(c_across(starts_with("risk")) == "risk")], collapse = ";")
) %>%
ungroup()
