Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex: if string ends with numerical values – drop it, OR if string ends with single letter – drop it, OR leave it

Here is a sample of a dataset I have:

ID Project
1 21st Townhouse 318
2 The Residences 6
3 Villanova Tower B
4 The Hills H
5 City Park

I need to transform ‘Project’ column so that:

  1. if a row ends with numeric values, they should be dropped
  2. if a row ends with single letter, it should be dropped
  3. else, leave as it is

Here is how I want it to look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID Project
1 21st Townhouse
2 The Residences
3 Villanova Tower
4 The Hills
5 City Park

I tried to search for some solution, and found this(for first condition with numeric values only):

df['Project']=df.Project[~((df.Project.astype(str).str.match("(.*\d)")) & (df.Project.astype(str).str.len() > 1))] 

It worked, however, I tried to apply it for the second condition as well:

df['Project']=df.Project[~((df.Project.astype(str).str.match("(.*\w)")) & (df.Project.astype(str).str.len() == 1))] 

But, It failed

Can you help me, please?
Thank you!

>Solution :

You can use

df['Project']=df['Project'].str.replace(r'\s+(?:\d+|[A-Za-z])$', '', regex=True)

See the regex demo.

Details:

  • \s+ – one or more whitespaces
  • (?:\d+|[A-Za-z]) – a non-capturing group matching either
    • \d+ – one or more digits
    • | – or
    • [A-Za-z] – an ASCII letter (use [^\W\d_] to match any Unicode letter)
  • $ – end of string.

If your strings have no whitespace before the last number/letter, you may use

df['Project'].str.replace(r'\b(?:\d+|[A-Za-z])$', '', regex=True).str.rstrip()

The \b matches a word boundary making sure there is no letter, digit or _ right before the final digits/letter, and .str.rstrip() will strip the trailing whitespace.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading