Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Mutate new variable based on whether a set of strings is present in multiple columns in R

I have clinical data with the medications participants are using, and I want to create new binary variables with medication categories (e.g., statin use). To do this I want to search for a set of strings (medication names) in multiple columns (medication1, medication2, etc.) to define the new variables.

Given the following code:

library(tidyverse)
ID <- sprintf("User % d", 1:4) 
med1 <- c("rosuvastatin", "ezetimibe", "insulin", "Lipitor")
med2 <- c("niacin", "insulin", "simvastatin", NA)
df <- data.frame(ID, med1, med2)

df <- df%>%
  mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "statin")) ~ 1))%>%
  mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "Lipitor")) ~ 1))
df$use_statin

I am hoping the use_statin column would display "1 NA 1 1", but instead is displays "NA NA NA 1". It appears that the second mutate line of code overwrites the first.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We could use a single if_any with pattern matching either one of them as | (OR) so that it won’t override the first match

library(dplyr)
library(stringr)
df %>% 
 mutate(use_statin = case_when(if_any(starts_with("med"),
       ~ str_detect(.x, pattern = "statin|Lipitor"))~ 1))

-output

        ID         med1        med2 use_statin
1 User  1 rosuvastatin      niacin          1
2 User  2    ezetimibe     insulin         NA
3 User  3      insulin simvastatin          1
4 User  4      Lipitor        <NA>          1

In the OP’s code, use_statin column was created with the statin match first and then overrided the output with Lipitor match. Instead we may need an | with the original column

df%>%
  mutate(use_statin = case_when(if_any(starts_with("med"),
   ~ str_detect(., pattern = "statin")) ~ 1))%>%
  mutate(use_statin = +(case_when(if_any(starts_with("med"), 
  ~ str_detect(., pattern = "Lipitor")) ~ 1)|use_statin))

-output

       ID         med1        med2 use_statin
1 User  1 rosuvastatin      niacin          1
2 User  2    ezetimibe     insulin         NA
3 User  3      insulin simvastatin          1
4 User  4      Lipitor        <NA>          1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading