Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

melt() function duplicating dataset

I have a table like this:

id name doggo floofer puppo pupper
1 rowa NaN NaN NaN NaN
2 ray NaN NaN NaN NaN
3 emma NaN NaN NaN pupper
4 sophy doggo NaN NaN NaN
5 jack NaN NaN NaN NaN
6 jimmy NaN NaN puppo NaN
7 bingo NaN NaN NaN NaN
8 billy NaN NaN NaN pupper
9 tiger NaN floofer NaN NaN
10 lucy NaN NaN NaN NaN

and I want the (doggo, floofer, puppo, pupper) columns to be in a single category column (dog_type).

Note:The NaN should also be NaN in the column since not all the dogs were categorized.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But after using:
df1 = df.melt(id_vars = ['id', 'name'], value_vars = ['doggo', 'floofer', 'pupper', 'puppo'], var_name = 'dog_types', ignore_index = True)

The melted df is now duplicated to 40 rows new df screenshot here
eww, an image

Pls how I do get the correct results without duplicates

>Solution :

df['dog_types'] = (df['doggo'].fillna(df['floofer'])
                              .fillna(df['puppo'])
                              .fillna(df['pupper']))
   id   name  doggo  floofer  puppo  pupper dog_types
0   1   rowa    NaN      NaN    NaN     NaN       NaN
1   2    ray    NaN      NaN    NaN     NaN       NaN
2   3   emma    NaN      NaN    NaN  pupper    pupper
3   4  sophy  doggo      NaN    NaN     NaN     doggo
4   5   jack    NaN      NaN    NaN     NaN       NaN
5   6  jimmy    NaN      NaN  puppo     NaN     puppo
6   7  bingo    NaN      NaN    NaN     NaN       NaN
7   8  billy    NaN      NaN    NaN  pupper    pupper
8   9  tiger    NaN  floofer    NaN     NaN   floofer
9  10   lucy    NaN      NaN    NaN     NaN       NaN

Afterwards you can drop redundant columns:

df.drop(columns=['doggo', 'floofer', 'puppo', 'pupper'], inplace=True)
   id   name dog_types
0   1   rowa       NaN
1   2    ray       NaN
2   3   emma    pupper
3   4  sophy     doggo
4   5   jack       NaN
5   6  jimmy     puppo
6   7  bingo       NaN
7   8  billy    pupper
8   9  tiger   floofer
9  10   lucy       NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading