Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove data after certain characters

I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.

I have a data frame and I have a column of type Character

  • The column is called " Eircode "

The postal codes go from D01 to D24 ( these are Dublin postal codes )

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The values are inputted like so
What you see in red is what needs to be removed.

I need to be able to remove the characters after the last digit.

My dataframe is called "MainSchools"

So if the " Eircode " is D03P820, I need to have it as D03 after my change.

I would preferably like to be able to do this with the Tidyverse package if possible.

>Solution :

You may use sub here:

df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
                 stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df

  Eircode
1     D15
2 K78YD27
3     D03

The regex pattern used above matches and captures Dublin postal codes as follows:

D           match D
(?:
    0[1-9]  followed by 0-9
    |       OR
    1[0-9]  10-19
    |       OR
    2[0-4]  20-24
)

Then, we use \1 as the replacement in sub, leaving behind only the 3 character Dublin postal code.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading