I have a dataset and need to remove parentheses from some rows within a column.
test
(ABC)
ABC(DEF)G
ABC
Desired Output
test
ABC
DEF
ABC
This is what I tried: df['test'] = df['test'].str.extract(r'\((.*)\)') When I do this it deletes the rows without parentheses all together. Any suggestions? Thank you in advance.
>Solution :
You can use
df['test'].str.replace(r'.*?\((.*)\).*', r'\1', regex=True)
# or a bit more efficient
df['test'].str.replace(r'[^(]*\((.*)\).*', r'\1', regex=True)
The point is to get to the first (, then capture all text after it till the last ), and then match the rest of the string, and then remove all text outside of the parentheses.
See the regex demo.
There can be more variations depending on the requirements.
Pattern details:
.*?– zero or more chars other than line break chars, as few as possible\(– a(char(.*)– Group 1: any zero or more chars other than line break chars, as many as possible\)– a)char.*– zero or more chars other than line break chars, as many as possible.