Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Converting pandas df from long to wide with duplicated rows and categorical variables as values

I have a simple dataframe

{'ID': {0: 101, 1: 101, 2: 101, 3: 102, 4: 102, 5: 102, 6: 102, 7: 102, 8: 103, 9: 103}, 'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'A', 6: 'B', 7: 'B', 8: 'A', 9: 'B'}}

You can see that ID has duplicates and gives me a reshaping error.

I want to convert if from wide to long and keep values in the Category column as a new column (I will have max 5 categories per ID, but some will have less than 5 and should have NaN. Plus want to rename columns to Role1, Role2 etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Expected output

ID  Role1   Role2   Role3   Role4   Role5
101 A       B       C       NaN     NaN
102 A       A       A       B       B
103 A       B       NaN     NaN     NaN

>Solution :

You can assign an enumeration within each ID and query that before pivoting:

N = 5 
(df.assign(role_enum=df.groupby('ID').cumcount()+1)
   .query('role_enum<=@N')
   .pivot(index='ID', columns='role_enum', values='Category')
   .add_prefix('Role').reset_index()   # book-keeping
)

Output:

role_enum   ID Role1 Role2 Role3 Role4 Role5
0          101     A     B     C   NaN   NaN
1          102     A     A     A     B     B
2          103     A     B   NaN   NaN   NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading