Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using `case_when` and `mutate` to search multiple columns for conditional

I am trying to create a new column in my data frame (NEW) using the case_when functionality in dplyr. I am able to get the code below to run, but I am wondering if there is a way to create this new column based on the four columns that start with COL_ as opposed to how it is currently written, looking only at COL_1. Otherwise, I would have to repeat each case four times (once for each COL_1, COL_2, COL_3, and COL_4).

library(dplyr)
set.seed(1)

# Make sample data
data <- data.frame(STRATUM_ID = c(rep("C19", 5), rep("C20", 15), rep("C21", 4)),
                   COL_1 = sample(c(rep("X", 3), rep("T", 2), rep("Y", 7), rep("Z", 5), rep("D", 5), rep("G", 2)), 24, replace = T),
                   COL_2 = sample(c(rep("T", 4), rep("G", 6), rep("Y", 3), rep("C", 2), rep("W", 6), rep("R", 3)), 24, replace = T),
                   COL_3 = sample(c(rep("G", 1), rep("F", 5), rep("D", 3), rep("Z", 7), rep("C", 3), rep("E", 5)), 24, replace = T),
                   COL_4 = sample(c(rep("E", 7), rep("G", 2), rep("Y", 7), rep("D", 5), rep("V", 1), rep("U", 2)), 24, replace = T))

# Create new column based on COL columns
data <- data %>% mutate(NEW = case_when(
  STRATUM_ID == "C20" & COL_1 == "X" ~ "Class_A",
  STRATUM_ID == "C20" & COL_1 %in% c("C", "D", "E") ~ "Class_B",
  STRATUM_ID == "C20" & COL_1 %in% c("U", "V", "W", "Y") ~ "Class_C",
  STRATUM_ID == "C20" & COL_1 == "T" ~ "Class_D",
  STRATUM_ID == "C20" & COL_1 %in% c("G", "Z", "R") ~ "Class_E",
  STRATUM_ID == "C20" & COL_1 == "F" ~ "Class_F",
  STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
  TRUE ~ STRATUM_ID
))

I have tried to use the following to address looking through all COL columns:

data <- data %>% mutate(test = case_when(
  STRATUM_ID == "C20" & grep("COL", colnames(data)) %in% c("C", "D", "E") ~ "CLASS_B"))
data <- data %>% mutate(test = case_when(
  STRATUM_ID == "C20" & vars(starts_with("COL")) %in% c("C", "D", "E") ~ "CLASS_B"))

Forgive me as the dataset I’m working with is actually much, much larger and I’ve tried my best to simplify the issue here.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

library(dplyr)

data %>% mutate(NEW = case_when(
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "X") ~ "Class_A",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("C", "D", "E"))  ~ "Class_B",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("U", "V", "W", "Y")) ~ "Class_C",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "T") ~ "Class_D",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . %in% c("G", "Z", "R")) ~ "Class_E",
  STRATUM_ID == "C20" & if_any(starts_with("COL"), ~ . == "F") ~ "Class_F",
  STRATUM_ID == "C20" & is.na(data$COL_1) ~ "Unknown",
  TRUE ~ STRATUM_ID
))

I am not 100% sure what you would be repeating for each "COL" but based on your attempts this looks like what you are trying to do.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading