Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to remove all periods from a string unless it is a dot in a number in R?

I am working on a data that has a text variable in it and I am not good in cleaning texts. I tried my best but it is just hard to find the answer.
Let’s take this text as example:

"I want. to remove all ... from the text except 5.3 or .5"

I want the output to be:

"I want to remove from the text except 5.3 or .5"

Could someone help me with that?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could ry:

library(stringr)

str_remove_all("I want to remove all ... from the text except 5.3.", "((?<!\\d)\\.(?!\\d)|\\.$)")
#> [1] "I want to remove all  from the text except 5.3"

There are two parts in an or bracked (...|...), the first (?<!\\d)\\.(?!\\d) says ‘remove periods that don’t have a number just before and after’, and the second \\.$ makes sure it removes the last one (which doesn’t get picked up by the first part).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading