Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter a string variable for values starting with a letter

I have a messy character variable like

df<-c("_oun_", "0000ff", "03815", "?3jhdb", "test", "1,000", "1.000")

and would lito to filter out all values that are not words. I thought a start would be to filter out all values not starting with a character.

How can I do this with tidyverse? for the above mentioned example the desired output would be test

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Some options with stringr. The regex finds anything that starts with a letter (upper or lower case) and is followed by any number of letters.

This prints the values directly without the need to manually subset the data:

str_subset(df, "^[:alpha:]+")
[1] "test"

With manual subsetting:

df[str_detect(df, "^[:alpha:]+")]
[1] "test"

or

df[str_which(df, "^[:alpha:]+")]
[1] "test"

Keeps the vector structure intact:

str_extract(df, "^[:alpha:]+")
[1] NA     NA     NA     NA     "test" NA     NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading