Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting dataframe with not just words

Say I have df as follows:

MyCol
Red Motor
Blue Taxi
Green Taxi-1
Light blue small Taxi-1 
Light blue big Taxi-2

I would like to split the color and the vehicle into two columns. I used this command to split the last word (could be any character).

The last word (could be any character, like taxi or taxi-1) refers to the vehicle. Sometimes, there is a ‘big’ or ‘small’ associated with the car name. The first few words (can be one or more than one words) refers to the color.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is what I have tried. It only works when the last word is a word without special characters. How can I include the case when special characters in the last word too?

df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w+).*$')

>Solution :

df['MyCol'].str.extract('^(.*?)\s((small|big|)\s?\S+)$')[[0, 1]]

resulting in:

0 1
0 Red Motor
1 Blue Taxi
2 Green Taxi-1
3 Light blue small Taxi-1
4 Light blue big Taxi
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading