Good morning,
I have exhaustively searched for how best to do two things in Python/Pandas, and have not yet found the answer.
I have a df such as:
| User | Role |
|---|---|
| Roger Dodger (rogerdodger) | user |
| Edwin Cullen (edwincullen) | user |
| Hunter Andrews (hunterandrews) | user |
I would like iterate over the user column and leave only the text inside the parenthesis, with a result such as:
| User | Role |
|---|---|
| rogerdodger | user |
| edwincullen | user |
| hunterandrews | user |
I’ve found many successful ways for iterating. I’ve not found a way to do the string edits cleanly. I’ve seen some regex suggestions but am not all that familiar with how to implement them based on the other examples given.
>Solution :
There are various ways to do that.
One way would be using pandas.Series.apply and a custom lambda function as follows
df['User'] = df['User'].apply(lambda x: x[x.find('(')+1:x.find(')')])
[Out]:
User Role
0 rogerdodger user
1 edwincullen user
2 hunterandrews user
Another way could be with pandas.Series.str.extract as follows
df['User'] = df['User'].str.extract(r'\((.*?)\)', expand=False)
[Out]:
User Role
0 rogerdodger user
1 edwincullen user
2 hunterandrews user
Notes:
-
If needed, one can also store the username in a different column, such as the column
usernameas followsdf['username'] = df['User'].str.extract(r'\((.*?)\)', expand=False) [Out]: User Role username 0 Roger Dodger (rogerdodger) user rogerdodger 1 Edwin Cullen (edwincullen) user edwincullen 2 Hunter Andrews (hunterandrews) user hunterandrews