Lets say i have a string and i only want unique words in the sentence as separate elements
a = "an apple is an apple"
word <- function(a){
words<- c(strsplit(a,split = " "))
return(unique(words))
}
word(a)
This returns
[[1]]
[1] "an" "apple" "is" "an" "apple"
and the output im expecting is
'an','apple','is'
what im doing wrong? really appreciate any help
Cheers!
>Solution :
The problem is that wrapping strsplit(.) in c(.) does not change the fact that it is still a list, and unique will be operating at the list-level, not the word-level.
c(strsplit(rep(a, 2), "\\s+"))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"
# [[2]]
# [1] "an" "apple" "is" "an" "apple"
unique(c(strsplit(rep(a, 2), "\\s+")))
# [[1]]
# [1] "an" "apple" "is" "an" "apple"
Alternatives:
-
If
length(a)is always 1, then perhapsunique(strsplit(a, "\\s+")[[1]]) # [1] "an" "apple" "is" -
If
length(a)can be 2 or more and you want a list of unique words for each sentence, thena2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange") lapply(strsplit(a2, "\\s+"), unique) # [[1]] # [1] "an" "apple" "is" # [[2]] # [1] "a" "pear" "is" # [[3]] # [1] "an" "orange" "is"(Note: this always returns a
list, regardless of the number of sentences in the input.) -
if
length(a)can be 2 ore more and you want a unique words across all sentences, thenunique(unlist(strsplit(a2, "\\s+"))) # [1] "an" "apple" "is" "a" "pear" "orange"(Note: this method also works well when
length(a)is 1.)