Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing first word from data frame cell when it starts with lowercase letter in R

I want to clean up a taxonomy table with bacterial species in R and I want to delete values from all cells that start with the small letter.

I have a column from taxonomy df:

Species
Tuwongella immobilis
Woesebacteria
unidentified marine
bacterium Ellin506

And I want:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Species
Tuwongella immobilis
Woesebacteria
unwanted <- "^[:upper:]+[:lower:]+"
tax.clean$Species <- str_replace_all(tax.clean$Species, unwanted, "")

but it doesn’t seem to work and does not match desired species.

>Solution :

If you are working with dataframe, I suggest using dplyr::filter to clean up the dataframe.

grepl() returns logical values, !grepl(^[[:lower:]]) looks for anything that does not start with a lower case letter (^ indicate the beginning of a string).

library(dplyr)

df %>% filter(!grepl("^[[:lower:]]", Species))

               Species
1 Tuwongella immobilis
2        Woesebacteria
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading