Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove last 3 characters from strings if they fit specific pattern in pandas

In my data frame, I have a lot of sting values in Column A that are very inconsistent.

One thing I want to do is that if the last 3 characters fit a specific pattern of a dash (-) followed by two numbers, I would like to remove the dash and two numbers.

So something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2X-VA-0561001-SBJ02-NI-01 would become 2X-VA-0561001-SBJ02-NI

Something like:

A.2-FW-74174-KB02-0000232-HT would remain the same

I’d ideally like to create a new column Column B to put these new values, keeping Column A

I think something like this would work, based on something I’ve done previously, but I can quite figure it out:

df['Column B'] = df['Column A'].str.replace(r'SOMETHING GOES HERE', '', regex=True)

>Solution :

Use regex -\d{2}$\d{2} is for match 2 digits and $ for end of strings:

df['Column B'] = df['Column A'].str.replace(r'-\d{2}$', '', regex=True)   
print (df)
                       Column A                      Column B
0     2X-VA-0561001-SBJ02-NI-01        2X-VA-0561001-SBJ02-NI
1  A.2-FW-74174-KB02-0000232-HT  A.2-FW-74174-KB02-0000232-HT
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading