Say I have the string –
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
I would like to remove numeric patterns that are 7, characters long, 8 characters long, and 4 characters long, EXCEPT if it is 1000. So essentially I want the following result –
"this is a string with some numbers 1000"
>Solution :
Use gsub here with the regex pattern \b(?:\d{7,8}|(?!1000\b)\d{4})\b:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\b(?:\\d{7,8}|(?!1000\\b)\\d{4})\\b", "", some_string, perl=TRUE)
output
[1] "this is a string with some numbers 1000 "
Actually, a better version, which tidies up loose whitespace, would be this:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\\s*(?:\\d{7,8}|(?!1000\\b)\\d{4})\\s*", " ", some_string, perl=TRUE)
output <- gsub("^\\s+|\\s+$", "", gsub("\\s{2,}", " ", output))
output
[1] "this is a string with some numbers 1000"