Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replacing numbers with characters in a text column in R

I would like to replace some numbers in the text column of my data. The numbers are either 8 or 9 digits and in two formats. This is snapshot of the data:

df <- data.frame(
  notes = c(
    'my number is 123-41-567',
    "321 12 788 is valid",
    'why not taking 987-012-678',
    '120 967 325 is correct'
  )
)

df %>% select(notes)

                       notes
1    my number is 123-41-567
2        321 12 788 is valid
3 why not taking 987-012-678
4     120 967 325 is correct

I need to replace them all with a term such as aaaaa. Hence, the data should look like:

           notes
1     my number is aaaaa
2        aaaaa is valid
3   why not taking aaaaa
4     aaaaa is correct

Thank you in advance!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Assuming the examples really do cover all possible cases (I would be careful). You can do this with the following regular expression:

\\d{3}( |-)\\d{2,3}( |-)\\d{3}

Here’s the code for replacing:

library(dplyr)
library(stringr)

df %>% 
    mutate(
        notes = str_replace_all(notes, '\\d{3}( |-)\\d{2,3}( |-)\\d{3}', 'XXXXXX')
    )

                  notes
1   my number is XXXXXX
2       XXXXXX is valid
3 why not taking XXXXXX
4     XXXXXX is correct
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading