Home Unable to split the column into multiple columns based on the first column value

Questions

Unable to split the column into multiple columns based on the first column value

July 11, 2022

I’ve a data frame which contains one column. Below is the example

Questionsbysortorder        
        
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3            
Q1-1,Q2-2,Q3-1,Q4-1             
Q1-5,Q2-3,Q3-3

I’m trying to explode the columns with the help of already given row values. Like below is the example

Questionsbysortorder        Q1    Q2   Q3   Q4   Q5
             
Q1-4,Q2-3,Q3-2,Q4-3,Q5-3    4     3    2    3    3  
Q1-1,Q2-2,Q3-1,Q4-1         1     2    1    1    NA 
Q1-5,Q2-3,Q5-3              5     3    NA   NA   3

Below is the code which i tried, but it’s returning an error

import pandas as pd
import numpy as np

df = pd.DataFrame({'Questionsbysortorder': ['Q1-4,Q2-3,Q3-2,Q4-3,Q5-3', 'Q1-1,Q2-2,Q3-1,Q4-1','Q1-5,Q2-3,Q5-3']})

df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split(',')

df = df.explode('Questionsbysortorder')

df['Questionsbysortorder'] = df['Questionsbysortorder'].str.split('-')

df = df.explode('Questionsbysortorder')

df = df.set_index('Questionsbysortorder').unstack().reset_index()

df.columns = ['Questionsbysortorder', 'value']

df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')

df.columns.name = None

print(df)

Error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-30dd8b8d4f59> in <module>()
     14 df = df.set_index('Questionsbysortorder').unstack().reset_index()
     15 
---> 16 df.columns = ['Questionsbysortorder', 'value']
     17 
     18 df = df.pivot(index='Questionsbysortorder', columns='value', values='Questionsbysortorder')

4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
     56         elif new_len != old_len:
     57             raise ValueError(
---> 58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"
     60             )

ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Can anyone please help me with this?

>Solution :

You are very close. You want to

split by ',',
explode the list,
split again by '-' to get the different fields
finally pivot the data

In code:

df.join(df.Questionsbysortorder.str.split(',')
   .explode()
   .str.split('-', expand=True)
   .set_index(0, append=True)[1]
   .unstack()
)

Output:

       Questionsbysortorder Q1 Q2 Q3   Q4   Q5
0  Q1-4,Q2-3,Q3-2,Q4-3,Q5-3  4  3  2    3    3
1       Q1-1,Q2-2,Q3-1,Q4-1  1  2  1    1  NaN
2            Q1-5,Q2-3,Q3-3  5  3  3  NaN  NaN