Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using unlist on a column of strings within a data frame

I have a data frame with a column that contains a string with multiple names separated by commas:

df = data.frame(my.text = c("John Smith, Johnny Smith, John Smith", "John Doe, Doe, Johnny", c="Jane Doe, Jane Doe"))

df
                               my.text
1 John Smith, Johnny Smith, John Smith
2                John Doe, Doe, Johnny
3                   Jane Doe, Jane Doe

I’d like to eliminate the duplicate names within in each row (i.e. get unique names) and store these at my.text so it looks this way:

df
                               my.text
1             John Smith, Johnny Smith
2                John Doe, Doe, Johnny
3                             Jane Doe

This code achieves this for a single string/row:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df$mytext[1] = paste(unique(unlist(strsplit(df$mytext[1], split = ", "))), collapse = ", ")

But how do I apply this on the entire my.text column? I have tried mapply but cannot figure out how to send it so many functions all at once. Or perhaps there’s a better way I’m overlooking?

>Solution :

strsplit is already vectorized, but to reduce it to a single string again, we can use lapply and paste:

sapply(strsplit(df$my.text, ",\\s*"), function(z) paste(unique(z), collapse = ", "))
# [1] "John Smith, Johnny Smith" "John Doe, Doe, Johnny"    "Jane Doe"                
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading