Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using str_detect in R to sort a datatable based on a character variable column

I have a datatable with mainly numerical column variables, as well as one character variable containing ~30 different character variables. I would like to use str_detect in R to read information in the column of data type character, compare it to a vector containing a set of strings, create a new dt column, and assign a value in this new column based on whether the initial column contained any of the strings found in the character/string vector.

Explaining in words isn’t very clear, so I’ll try to use iris as an example to explain, although this dataset is a bit too simple; I wouldn’t really want to do what I’m attempting here with iris.

This is what I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mydesiredgrouping <- c("virginica","versicolor")
test <- iris %>% mutate(Group1 = if_else(str_detect(iris$Species, mydesiredgrouping), "mygroup", "notmygroup"))

But when I do this I get the following error.

Error in mutate(): ℹ In argument: Group1 = if_else(...). Caused by
error in str_detect(): ! Can’t recycle string (size 150) to match
pattern (size 2). Run rlang::last_trace() to see where the error
occurred.

Does anyone know what I need to change?
This is the output I am hoping to get, but am not getting.

Desired Output

EDIT

I forgot to mention that in my own dataset I had some partial matches, which was why I was trying to use str_detect in the first place. Fortunately, I got the answer I was looking for despite this, but I am adding this information for clarity.

>Solution :

Ronak’s answer is the best if you are sure you have complete matches. If you have partial matches, you can use vertical bars to collapse the search terms into one regex, and then pass that as a pattern to str_detect:

test <- iris %>% mutate(Group1 = if_else(str_detect(Species, paste(mydesiredgrouping, collapse = "|")), "mygroup", "notmygroup"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading