Home Converting pandas df from long to wide with duplicated rows and categorical variables as values

Questions

Converting pandas df from long to wide with duplicated rows and categorical variables as values

August 25, 2022

I have a simple dataframe

{'ID': {0: 101, 1: 101, 2: 101, 3: 102, 4: 102, 5: 102, 6: 102, 7: 102, 8: 103, 9: 103}, 'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'A', 6: 'B', 7: 'B', 8: 'A', 9: 'B'}}

You can see that ID has duplicates and gives me a reshaping error.

I want to convert if from wide to long and keep values in the Category column as a new column (I will have max 5 categories per ID, but some will have less than 5 and should have NaN. Plus want to rename columns to Role1, Role2 etc.

Expected output

ID  Role1   Role2   Role3   Role4   Role5
101 A       B       C       NaN     NaN
102 A       A       A       B       B
103 A       B       NaN     NaN     NaN

>Solution :

You can assign an enumeration within each ID and query that before pivoting:

N = 5 
(df.assign(role_enum=df.groupby('ID').cumcount()+1)
   .query('role_enum<=@N')
   .pivot(index='ID', columns='role_enum', values='Category')
   .add_prefix('Role').reset_index()   # book-keeping
)

Output:

role_enum   ID Role1 Role2 Role3 Role4 Role5
0          101     A     B     C   NaN   NaN
1          102     A     A     A     B     B
2          103     A     B   NaN   NaN   NaN