Home Combing Two Columns into Single Column on New Rows in Pandas Dataframe

Questions

Combing Two Columns into Single Column on New Rows in Pandas Dataframe

June 3, 2022

I had a dataframe that looked like this:

id	stu1	stu2
11	James	Rob
22	Bob	Tom
33	Ryan	Ian
44	Oliver	Saj
55	Burt	Ben
66	Chris	Rich

To combine stu1 and stu2 into a single column with each name on a new row (and duplicating id for each name) I used this:

stu_df = stu_df.set_index('id').stack().reset_index(level=1, drop=True).to_frame('stu').reset_index()

Which gave this result:

id	stu
11	James
11	Rob
22	Bob
22	Tom
33	Ryan
33	Ian
44	Oliver
44	Saj
55	Burt
55	Ben
66	Chris
66	Rich

The original dataframe has been updated to include dates and looks like this:

date	id	stu1	stu2
03/06/2022	11	James	Rob
03/06/2022	22	Bob	Tom
03/06/2022	33	Ryan	Ian
03/06/2022	44	Oliver	Saj
03/06/2022	55	Burt	Ben
03/06/2022	66	Chris	Rich

I want to combine stu1 and stu2 in the same way (duplicating the date and id for each name) but can’t get it to work.

The desired outcome is this:

date	id	stu
03/06/2022	11	James
03/06/2022	11	Rob
03/06/2022	22	Bob
03/06/2022	22	Tom
03/06/2022	33	Ryan
03/06/2022	33	Ian
03/06/2022	44	Oliver
03/06/2022	44	Saj
03/06/2022	55	Burt
03/06/2022	55	Ben
03/06/2022	66	Chris
03/06/2022	66	Rich

I’m new to Pandas and am struggling. Can anyone help? Thanks

>Solution :

I would recommend using panda’s melt function as follows :

import pandas as pd

# a sample of your dataframe
df = pd.DataFrame({
    'date' : ['03/06/2022', '03/06/2022', '03/06/2022'],
    'id' : [11,22,33], 
    'stu1' : ['James', 'Bob', 'Ryan' ], 
    'stu2' : ['James', 'Rob', 'Tom' ]})

# melt the dataframe
df_melted = df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'])

print(df_melted)

>>>
       date     id  variable    value
0   03/06/2022  11  stu1        James
1   03/06/2022  22  stu1         Bob
2   03/06/2022  33  stu1         Ryan
3   03/06/2022  11  stu2        James
4   03/06/2022  22  stu2         Rob
5   03/06/2022  33  stu2         Tom

If you do not care at all about the column variable that helps identify the origin of the name (from which column it came) then you can simply drop it.

You can also customize the name of the output variables (instead of variable and value). Here’s an example to get your exact desired output :

df.melt(id_vars=['date', 'id'], value_vars= ['stu1', 'stu2'], value_name='stu').drop(['variable'], axis = 1)

Take a look at the original documentation pandas.melt for more info!

pandas

byMR

Published June 03, 2022

Add a comment

PowerShell Active Directory – Can't use array value in filter for Get-ADUser

byMR

June 3, 2022

Questions

Append data into data frame

byMR

June 3, 2022

Questions

resolve enum class variable name

byMR

June 3, 2022

Questions

Vue Router Push How To Set Optional Query Parameters?

byMR

June 3, 2022

Questions

pyqt5 file_open method error, i got this error when i opened an image

byMR

June 3, 2022

Combing Two Columns into Single Column on New Rows in Pandas Dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

PowerShell Active Directory – Can't use array value in filter for Get-ADUser

Append data into data frame

resolve enum class variable name

Vue Router Push How To Set Optional Query Parameters?

pyqt5 file_open method error, i got this error when i opened an image

Keep Up to Date with the Most Important News

Combing Two Columns into Single Column on New Rows in Pandas Dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

PowerShell Active Directory – Can't use array value in filter for Get-ADUser

Append data into data frame

resolve enum class variable name

Vue Router Push How To Set Optional Query Parameters?

pyqt5 file_open method error, i got this error when i opened an image

how to find a max element of highest sequence number

Discover more from Dev solutions