Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Shortest way to remove duplicate words from string

I have this string:

x <- c("A B B C")

[1] "A B B C"

I am looking for the shortest way to get this:

[1] "A B C"

I have tried this:
Removing duplicate words in a string in R

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

paste(unique(x), collapse = ' ')

[1] "A B B C"
# does not work

Background:
In a dataframe column I want to count only the unique word counts.

>Solution :

A regex based approach could be shorter – match the non-white space (\\S+) followed by a white space character (\\s), capture it, followed by one or more occurrence of the backreference, and in the replacement, specify the backreference to return only a single copy of the match

gsub("(\\S+\\s)\\1+", "\\1", x)
[1] "A B C"

Or may need to split the string with strsplit, unlist, get the unique and then paste

paste(unique(unlist(strsplit(x, " "))), collapse = " ")
# [1] "A B C"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading