Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas split string column on any character that is not an alphabet

I have dataframe something like below

a    str_col
1    ABC*EFG
2    DDC/DSD
3.   sew^sds 
...

I want to split them on non alphabet and into a list. Desired df is as follows

a    str_col.   new_col
1    ABC*EFG.   [ABC, EFG]
2    DDC/DSD.   [DDC, DSD]
3.   sew^sds    [sew, sds]
...

I’ve tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • df['str_col'].str.split('^[a-zA-Z]+') but it created something like [, *EFG]`

>Solution :

You can use [^a-zA-Z], or \W+ (equivalent to [^a-zA-Z0-9_]) that should also work in your case:

df['new_col'] = df['str_col'].str.split(r'[^a-zA-Z]+')

df['new_col'] = df['str_col'].str.split(r'\W+')

Output:

     a  str_col     new_col
0  1.0  ABC*EFG  [ABC, EFG]
1  2.0  DDC/DSD  [DDC, DSD]
2  3.0  sew^sds  [sew, sds]

^[a-zA-Z]+ failed because ^ is an anchor to the start of the string when outside of […].

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading