Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

remove duplicate values in cell without removing row

I have a column of strings variables that are separated with white space and need to remain strings. How can I remove the duplicate values and values longer than 4 characters?

company        counts 
company1       2222 2222 45345234 425352352352 6574745 299
company2       9909 4363465246 543 323 9909 3454534534 768 

I would like to end up with something like this:

company        counts 
company1       2222 299
company2       9909 543 323 768 

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

strsplit the strings, remove the long ones and the duplicates and paste back together:

sapply(
    strsplit(dat$counts, "\\s+"),
    \(x) paste(x[nchar(x) <= 4 & (!duplicated(x))], collapse=" ")
)
##[1] "2222 299"         "9909 543 323 768"

Where dat was:

dat <- read.csv(text="company,counts 
company1,2222 2222 45345234 425352352352 6574745 299
company2,9909 4363465246 543 323 9909 3454534534 768")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading