Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return only the unique words

Lets say i have a string and i only want unique words in the sentence as separate elements

 a = "an apple is an apple"
word <- function(a){
  
  words<- c(strsplit(a,split = " "))
  return(unique(words))
}

word(a)

This returns

[[1]]
[1] "an"    "apple" "is"    "an"    "apple"

and the output im expecting is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

'an','apple','is'

what im doing wrong? really appreciate any help

Cheers!

>Solution :

The problem is that wrapping strsplit(.) in c(.) does not change the fact that it is still a list, and unique will be operating at the list-level, not the word-level.

c(strsplit(rep(a, 2), "\\s+"))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"
# [[2]]
# [1] "an"    "apple" "is"    "an"    "apple"
unique(c(strsplit(rep(a, 2), "\\s+")))
# [[1]]
# [1] "an"    "apple" "is"    "an"    "apple"

Alternatives:

  1. If length(a) is always 1, then perhaps

    unique(strsplit(a, "\\s+")[[1]])
    # [1] "an"    "apple" "is"   
    
  2. If length(a) can be 2 or more and you want a list of unique words for each sentence, then

    a2 <- c("an apple is an apple", "a pear is a pear", "an orange is an orange")
    lapply(strsplit(a2, "\\s+"), unique)
    # [[1]]
    # [1] "an"    "apple" "is"   
    # [[2]]
    # [1] "a"    "pear" "is"  
    # [[3]]
    # [1] "an"     "orange" "is"    
    

    (Note: this always returns a list, regardless of the number of sentences in the input.)

  3. if length(a) can be 2 ore more and you want a unique words across all sentences, then

    unique(unlist(strsplit(a2, "\\s+")))
    # [1] "an"     "apple"  "is"     "a"      "pear"   "orange"
    

    (Note: this method also works well when length(a) is 1.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading