Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting pandas column with two delimiters creates extra empty columns

I load in data from a CSV, one of the columns has this format:

!Color1:Color2:Color3!

!White:Green:Black!
!Green:Blue:Yellow!!Red:Brown:Blue!!White:Green:Black!
!Green:Blue:Yellow!!White:Green:Black!
!Red:Brown:Blue!!White:Green:Black!

I want to discard all of the other columns, pick this one out, then split this one into this:

     0                      1                     2
0    White:Green:Black      None                  None
1    Green:Blue:Yellow      Red:Brown:Blue        White:Green:Black
2    Green:Blue:Yellow      White:Green:Black     None
3    Red:Brown:Blue         White:Green:Black     None

Below is how I tried to do it:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.read_csv(csv_path, index_col=False)
new_df = df['!Color1:Color2:Color3!'].str.split('!', expand=True)

But it ends up like this:

     0        1                      2          3                     4         5
0    None     White:Green:Black      None       None                  None      None
1    None     Green:Blue:Yellow      None       Red:Brown:Blue        None      White:Green:Black
2    None     Green:Blue:Yellow      None       White:Green:Black     None      None
3    None     Red:Brown:Blue         None       White:Green:Black     None      None

So it interprets the first "!" as a field of its own, and so it adds empty fields between the "parts".

Bonus question:
After that is achieved, how do I pick out the middle color in each column, like this?:

     0           1           2
0    Green       None        None
1    Blue        Brown       Green
2    Blue        Green       None
3    Brown       Green       None

>Solution :

Add Series.str.strip for avoid last and first columns filled by empty strings and regex !{1,} for split 1 or multiple !:

new_df = df['!Color1:Color2:Color3!'].str.strip('!').str.split('!{1,}', expand=True)
print (new_df)
                   0                  1                  2
0  White:Green:Black               None               None
1  Green:Blue:Yellow     Red:Brown:Blue  White:Green:Black
2  Green:Blue:Yellow  White:Green:Black               None
3     Red:Brown:Blue  White:Green:Black               None

Also if need second splitted values by : use custom lambda function:

new_df = (df['!Color1:Color2:Color3!']
               .str.strip('!')
               .str.split('!{1,}', expand=True)
               .apply(lambda x: x.str.split(':').str[1]))
print (new_df)
       0      1      2
0  Green   None   None
1   Blue  Brown  Green
2   Blue  Green   None
3  Brown  Green   None
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading