Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace multiple phrases with NA (or blank) in R

I am working in R.

I have some phrases that I want to remove from some text strings in a dataframe.
words_remove shows the phrases I want to replace. Unless the whole exact phrase is in the string, I don’t want it to be removed.

words_remove <- c("red cats", "blue dogs", "pink horse")

This is my data frame:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data <- data.frame(row_id=1:4, text = c("red cats don't exist", "I have a blue dog", "I don't like blue dogs", "I like horses"))
row_id text
1 red cats don’t exist
2 I have a blue dog
3 I don’t like blue dogs
4 I like horses

I want to replace all instances of "words_remove" in "text" with NA (or even better remove them entirely).

My required output:

row_id text
1 don’t exist
2 I have a blue dog
3 I don’t like
4 I like horses

In my real dataframe, there are many phrases in "words_remove" so case_when etc would be too time consuming to do I think.

Any ideas?

>Solution :

You may form a regex alternation of the phrases and do a replacement on that:

words_remove <- c("red cats", "blue dogs", "pink horse")
regex <- paste0("\\s*\\b(?:", paste(words_remove, collapse="|"), ")\\b\\s*")
data$text <- gsub("^\\s+|\\s+$", "", gsub(regex, " ", data$text))
data

row_id              text
1      1       don't exist
2      2 I have a blue dog
3      3      I don't like
4      4     I like horses

The strategy here is to replace any matching phrase plus any surrounding whitespace with just a single space. The outer call to gsub() strips off any remaining leading/trailing whitespace.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading