Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I remove numeric patterns of a certain length from a string in R

Say I have the string –

some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"

I would like to remove numeric patterns that are 7, characters long, 8 characters long, and 4 characters long, EXCEPT if it is 1000. So essentially I want the following result –

"this is a string with some numbers 1000"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use gsub here with the regex pattern \b(?:\d{7,8}|(?!1000\b)\d{4})\b:

some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\b(?:\\d{7,8}|(?!1000\\b)\\d{4})\\b", "", some_string, perl=TRUE)
output

[1] "this is a string with some numbers   1000  "

Actually, a better version, which tidies up loose whitespace, would be this:

some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\s*(?:\\d{7,8}|(?!1000\\b)\\d{4})\\s*", " ", some_string, perl=TRUE)
output <- gsub("^\\s+|\\s+$", "", gsub("\\s{2,}", " ", output))
output

[1] "this is a string with some numbers 1000"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading