I have this string:
x <- c("A B B C")
[1] "A B B C"
I am looking for the shortest way to get this:
[1] "A B C"
I have tried this:
Removing duplicate words in a string in R
paste(unique(x), collapse = ' ')
[1] "A B B C"
# does not work
Background:
In a dataframe column I want to count only the unique word counts.
>Solution :
A regex based approach could be shorter – match the non-white space (\\S+) followed by a white space character (\\s), capture it, followed by one or more occurrence of the backreference, and in the replacement, specify the backreference to return only a single copy of the match
gsub("(\\S+\\s)\\1+", "\\1", x)
[1] "A B C"
Or may need to split the string with strsplit, unlist, get the unique and then paste
paste(unique(unlist(strsplit(x, " "))), collapse = " ")
# [1] "A B C"