Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R – Testing which rows of a multi-column dataframe contain keyword

Assume a dataframe dat with p-values.

dat <- data.frame(var1 = c("0.12", "0.12", "0.12*"), 
                  var2 = c("0.12", "0.12", "0.12"), 
                  var3 = c("0.12", "0.12", "0.12"))

How do I test which rows contain an asterisk?

Attempt 1:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dat %>%
+ mutate(anyTRUE = if_any(.rows = contains('\\*'), isTRUE))
   var1 var2 var3 anyTRUE
1  0.12 0.12 0.12    TRUE
2  0.12 0.12 0.12    TRUE
3 0.12* 0.12 0.12    TRUE

>Solution :

Use str_detect/greplcontains/matches/starts_with/ends_with are all select-helpers used to match and select column names based on a pattern. Here, we want to detect rows having a pattern.

library(stringr)
library(dplyr)
dat <- dat %>%
    mutate(anyTRUE = if_any(everything(), ~ str_detect(.x, fixed("*"))))

-output

dat
   var1 var2 var3 anyTRUE
1  0.12 0.12 0.12   FALSE
2  0.12 0.12 0.12   FALSE
3 0.12* 0.12 0.12    TRUE

NOTE: fixed is used as the pattern by default uses regex mode and * is a metacharacter to specify zero or more of the character preceding it. Either escape (\\) or use fixed (which would be faster)


Or using base R

dat$anyTRUE <-  Reduce(`|`, lapply(dat, grepl, pattern = "*", fixed = TRUE))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading