Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to make new variable that takes 1 if the string in another column contains a word with varying punctuation and font size?

I have a column that looks something like this

col1 
"business"
"BusinesS"
"education"
"some BUSINESS ."
"business of someone, that is cool"
" not the b word"
"busi ness"
"busines." 
"businesses"
"something else"

And I need an efficient way of getting all this string data into a new value

col1                col2
NA                  1
NA                  1
"education"         NA
NA                  1
NA                  1
" not the b word"   NA
NA                  1
NA                  1
NA                  1
"something else"    NA

So the common denominator is "busines", but I don’t know how to efficiently make it sort out all the spaces, punctuation, lower/uppercases, other words etc. in one mutate that creates a new column.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

library(dplyr)
library(stringr) 
df %>%
  mutate(col2 = ifelse(str_detect(col1, "(?i)busi\\s?ness?"),
                       1,
                       NA)

We can use ifelse to set 1 if str_detect detects any form of business, and NA if it doesn’t. Note that (?i) makes the match case-insensitive and ? in \\s? and s? makes the preceding item optional; so \\s? matches an optional space and s? matches an optional literal s

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading