Removing strings outside of parentheses in python

June 6, 2023

I have a dataset and need to remove parentheses from some rows within a column.

  test
 (ABC)
 ABC(DEF)G
 ABC

Desired Output

  test
  ABC
  DEF
  ABC

This is what I tried: df['test'] = df['test'].str.extract(r'\((.*)\)') When I do this it deletes the rows without parentheses all together. Any suggestions? Thank you in advance.

>Solution :

You can use

df['test'].str.replace(r'.*?\((.*)\).*', r'\1', regex=True)
# or a bit more efficient
df['test'].str.replace(r'[^(]*\((.*)\).*', r'\1', regex=True)

The point is to get to the first (, then capture all text after it till the last ), and then match the rest of the string, and then remove all text outside of the parentheses.

See the regex demo.

There can be more variations depending on the requirements.

Pattern details:

.*? – zero or more chars other than line break chars, as few as possible
\( – a ( char
(.*) – Group 1: any zero or more chars other than line break chars, as many as possible
\) – a ) char
.* – zero or more chars other than line break chars, as many as possible.