Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing strings outside of parentheses in python

I have a dataset and need to remove parentheses from some rows within a column.

  test
 (ABC)
 ABC(DEF)G
 ABC

Desired Output

  test
  ABC
  DEF
  ABC

This is what I tried: df['test'] = df['test'].str.extract(r'\((.*)\)') When I do this it deletes the rows without parentheses all together. Any suggestions? Thank you in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use

df['test'].str.replace(r'.*?\((.*)\).*', r'\1', regex=True)
# or a bit more efficient
df['test'].str.replace(r'[^(]*\((.*)\).*', r'\1', regex=True)

The point is to get to the first (, then capture all text after it till the last ), and then match the rest of the string, and then remove all text outside of the parentheses.

See the regex demo.

There can be more variations depending on the requirements.

Pattern details:

  • .*? – zero or more chars other than line break chars, as few as possible
  • \( – a ( char
  • (.*) – Group 1: any zero or more chars other than line break chars, as many as possible
  • \) – a ) char
  • .* – zero or more chars other than line break chars, as many as possible.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading