Say I have df as follows:
MyCol
Red Motor
Blue Taxi
Green Taxi-1
Light blue small Taxi-1
Light blue big Taxi-2
I would like to split the color and the vehicle into two columns. I used this command to split the last word (could be any character).
The last word (could be any character, like taxi or taxi-1) refers to the vehicle. Sometimes, there is a ‘big’ or ‘small’ associated with the car name. The first few words (can be one or more than one words) refers to the color.
This is what I have tried. It only works when the last word is a word without special characters. How can I include the case when special characters in the last word too?
df['MyCol'].str.extract('^(.*?)\s((?:small|big)?\s?\w+).*$')
>Solution :
df['MyCol'].str.extract('^(.*?)\s((small|big|)\s?\S+)$')[[0, 1]]
resulting in:
| 0 | 1 | |
|---|---|---|
| 0 | Red | Motor |
| 1 | Blue | Taxi |
| 2 | Green | Taxi-1 |
| 3 | Light blue | small Taxi-1 |
| 4 | Light blue | big Taxi |