Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to split the column into multiple columns based on the first column value

I’ve a data frame which contains one column. Below is the example

Questionsbysortorder        
        
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3            
Q1-1,Q2-2,Q3-1,Q4-1             
Q1-5,Q2-3,Q3-3  

I’m trying to explode the columns with the help of already given row values. Like below is the example

Questionsbysortorder        Q1    Q2   Q3   Q4   Q5
             
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3    4     3    2    3    3  
Q1-1,Q2-2,Q3-1,Q4-1         1     2    1    1    NA 
Q1-5,Q2-3,Q5-3              5     3    NA   NA   3

Below is the code which i tried, but it’s returning an error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
import numpy as np

df = pd.DataFrame({'Questionsbysortorder': ['Q1-4,Q2-3,Q3-2,Q4-3,Q5-3', 'Q1-1,Q2-2,Q3-1,Q4-1','Q1-5,Q2-3,Q5-3']})

df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split(',')

df = df.explode('Questionsbysortorder')

df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split('-')

df = df.explode('Questionsbysortorder')

df = df.set_index('Questionsbysortorder').unstack().reset_index()

df.columns = ['Questionsbysortorder', 'value']

df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')

df.columns.name = None

print(df)

Error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-30dd8b8d4f59> in <module>()
     14 df = df.set_index('Questionsbysortorder').unstack().reset_index()
     15 
---> 16 df.columns = ['Questionsbysortorder', 'value']
     17 
     18 df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')

4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
     56         elif new_len != old_len:
     57             raise ValueError(
---> 58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"
     60             )

ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Can anyone please help me with this?

>Solution :

You are very close. You want to

  • split by ',',
  • explode the list,
  • split again by '-' to get the different fields
  • finally pivot the data

In code:

df.join(df.Questionsbysortorder.str.split(',')
   .explode()
   .str.split('-', expand=True)
   .set_index(0, append=True)[1]
   .unstack()
)

Output:

       Questionsbysortorder Q1 Q2 Q3   Q4   Q5
0  Q1-4,Q2-3,Q3-2,Q4-3,Q5-3  4  3  2    3    3
1       Q1-1,Q2-2,Q3-1,Q4-1  1  2  1    1  NaN
2            Q1-5,Q2-3,Q3-3  5  3  3  NaN  NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading