Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Transforming many columns into 3 column categories which contains lists?

I have a DataFrame with 31 columns, which contains 3 categories "Classic", "Premium" and "Luxe" I want to swap the way the DataFrame works to have only 3 comumns "Classic", "Premium" and "Luxe" and 31 categories which can be listed inside.

Since I’m new I can only post a link to the picture for more clarity :
Here is what I have and what I want to do

Here’s what I tried so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Suppose we use this DataFrame

import pandas as pd
import numpy as np

d = {'Name': ["Alban","Benan","Conor","Dino","Egor","Fatima" ],'Virtual visits': ["Classique", "Classique","","" ,"Premium","Premium"], 'Cars' :["","" ,"Luxe","Luxe","Luxe", "Premium"],'Portraits' :["","" ,"Classique","Classique","Luxe", "Premium"],'Animals' :["Premium","Luxe" ,"","","Luxe", ""]}
df = pd.DataFrame(data=d)
df

    Name    Virtual visits  Cars       Portraits    Animals
0   Alban   Classique                  Premium
1   Benan   Classique                  Luxe
2   Conor   Luxe            Classique   
3   Dino    Luxe            Classique   
4   Egor    Premium         Luxe       Luxe         Luxe
5   Fatima  Premium         Premium    Premium  

Using melt looked promising so I did that and a groupBy:

df = df.melt(id_vars=["Name"])
df.groupby(by=["Name"]).sum()

Name    variable                            value   
Alban   Virtual visitsCarsPortraitsAnimals  ClassiquePremium
Benan   Virtual visitsCarsPortraitsAnimals  ClassiqueLuxe
Conor   Virtual visitsCarsPortraitsAnimals  LuxeClassique
Dino    Virtual visitsCarsPortraitsAnimals  LuxeClassique
Egor    Virtual visitsCarsPortraitsAnimals  PremiumLuxeLuxeLuxe
Fatima  Virtual visitsCarsPortraitsAnimals  PremiumPremiumPremium

So the column "Variable" is made up of all the column names, which is not bad but instead of a single string added to one another, I would like to have a list, and instead of columns Variable and Value, I would need to have 3 columns, "Premium", "Classque", "Luxe" and in each of them have the relevant list of categories appear.

Thanks for your help.

>Solution :

You were right with melt. After that you need a pivot table:

out = df.replace('',np.nan).melt("Name")\
.pivot_table(index="Name",columns="value",values="variable",aggfunc=','.join)\
.rename_axis(None,axis=1).reset_index()

print(out)

     Name       Classique                    Luxe  \
0   Alban  Virtual visits                     NaN   
1   Benan  Virtual visits                 Animals   
2   Conor       Portraits                    Cars   
3    Dino       Portraits                    Cars   
4    Egor             NaN  Cars,Portraits,Animals   
5  Fatima             NaN                     NaN   

                         Premium  
0                        Animals  
1                            NaN  
2                            NaN  
3                            NaN  
4                 Virtual visits  
5  Virtual visits,Cars,Portraits  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading