I am working on a data that has a text variable in it and I am not good in cleaning texts. I tried my best but it is just hard to find the answer.
Let’s take this text as example:
"I want. to remove all ... from the text except 5.3 or .5"
I want the output to be:
"I want to remove from the text except 5.3 or .5"
Could someone help me with that?
>Solution :
You could ry:
library(stringr)
str_remove_all("I want to remove all ... from the text except 5.3.", "((?<!\\d)\\.(?!\\d)|\\.$)")
#> [1] "I want to remove all from the text except 5.3"
There are two parts in an or bracked (...|...), the first (?<!\\d)\\.(?!\\d) says ‘remove periods that don’t have a number just before and after’, and the second \\.$ makes sure it removes the last one (which doesn’t get picked up by the first part).