I have dataframe something like below
a str_col
1 ABC*EFG
2 DDC/DSD
3. sew^sds
...
I want to split them on non alphabet and into a list. Desired df is as follows
a str_col. new_col
1 ABC*EFG. [ABC, EFG]
2 DDC/DSD. [DDC, DSD]
3. sew^sds [sew, sds]
...
I’ve tried
df['str_col'].str.split('^[a-zA-Z]+') but it created something like[, *EFG]`
>Solution :
You can use [^a-zA-Z], or \W+ (equivalent to [^a-zA-Z0-9_]) that should also work in your case:
df['new_col'] = df['str_col'].str.split(r'[^a-zA-Z]+')
df['new_col'] = df['str_col'].str.split(r'\W+')
Output:
a str_col new_col
0 1.0 ABC*EFG [ABC, EFG]
1 2.0 DDC/DSD [DDC, DSD]
2 3.0 sew^sds [sew, sds]
^[a-zA-Z]+ failed because ^ is an anchor to the start of the string when outside of […].