I have a dataframe where in one of the columns I only want to keep a subset of the string. In the example below I only want to keep the peoples names.
**Example: **
column 1
1.Joe Smith, NYC(212)
2.Jane Doe, HOU(713)
To remove everything left of the name I have used df['column1'] = df['column1'].str.lstrip("0123456789.")
This worked successfully. But isloltating the name from the comma onward is what I can’t figure out. Not sure if RegEx would be better suited here?
Thanks!
>Solution :
Try with regex to extract names,
df['column1'].str.extract(r'\d+\.(.+?),')
Output:
0 Joe Smith
1 Jane Doe
More details on pattern,
\d+: Match one or more digits.\.: Match a period (dot) character.(.+?): Capture one or more characters (non-greedy) into a group.,: Match a comma character.