Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combing Two Columns into Single Column on New Rows in Pandas Dataframe

I had a dataframe that looked like this:

id stu1 stu2
11 James Rob
22 Bob Tom
33 Ryan Ian
44 Oliver Saj
55 Burt Ben
66 Chris Rich

To combine stu1 and stu2 into a single column with each name on a new row (and duplicating id for each name) I used this:

stu_df = stu_df.set_index('id').stack().reset_index(level=1, drop=True).to_frame('stu').reset_index()

Which gave this result:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id stu
11 James
11 Rob
22 Bob
22 Tom
33 Ryan
33 Ian
44 Oliver
44 Saj
55 Burt
55 Ben
66 Chris
66 Rich

The original dataframe has been updated to include dates and looks like this:

date id stu1 stu2
03/06/2022 11 James Rob
03/06/2022 22 Bob Tom
03/06/2022 33 Ryan Ian
03/06/2022 44 Oliver Saj
03/06/2022 55 Burt Ben
03/06/2022 66 Chris Rich

I want to combine stu1 and stu2 in the same way (duplicating the date and id for each name) but can’t get it to work.

The desired outcome is this:

date id stu
03/06/2022 11 James
03/06/2022 11 Rob
03/06/2022 22 Bob
03/06/2022 22 Tom
03/06/2022 33 Ryan
03/06/2022 33 Ian
03/06/2022 44 Oliver
03/06/2022 44 Saj
03/06/2022 55 Burt
03/06/2022 55 Ben
03/06/2022 66 Chris
03/06/2022 66 Rich

I’m new to Pandas and am struggling. Can anyone help? Thanks

>Solution :

I would recommend using panda’s melt function as follows :

import pandas as pd

# a sample of your dataframe
df = pd.DataFrame({
    'date' : ['03/06/2022', '03/06/2022', '03/06/2022'],
    'id' : [11,22,33], 
    'stu1' : ['James', 'Bob', 'Ryan' ], 
    'stu2' : ['James', 'Rob', 'Tom' ]})

# melt the dataframe
df_melted = df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'])

print(df_melted)

>>>
       date     id  variable    value
0   03/06/2022  11  stu1        James
1   03/06/2022  22  stu1         Bob
2   03/06/2022  33  stu1         Ryan
3   03/06/2022  11  stu2        James
4   03/06/2022  22  stu2         Rob
5   03/06/2022  33  stu2         Tom

If you do not care at all about the column variable that helps identify the origin of the name (from which column it came) then you can simply drop it.

You can also customize the name of the output variables (instead of variable and value). Here’s an example to get your exact desired output :

df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'], value_name='stu').drop(['variable'], axis = 1)

Take a look at the original documentation pandas.melt for more info!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading