Home Merging two data frames based on a common column with repeated values

Questions

Merging two data frames based on a common column with repeated values

November 9, 2022

I have 2 data frames with different lengths:
len(df1) 10104
len(df2) 15560

I want to merge these based on a common column (taskID) on both data frames. The Task ID has repeated IDs each ID represent an item belongs to the same task.

example:

df1:

TaskID	task duration
task 45	2 mins
task 45	5 mins
task 45	7 mins
task 67	8 mins
task 67	9 mins

df2:

TaskID	gender
task 45	male
task 45	male
task 45	male
task 67	female
task 67	female

when I use Pandas merge function I get an output with a length of 471437 which is greater than the length of both data frames.

I think this is because of the repeated values of TaskID, and I can’t remove the duplicates as each row represent different item.

I tried:

merged=pd.merge(df1,df2, on=’TaskID’, how=’inner’)

I get an output with a length of 471437

Can you please help with this issue ?

>Solution :

You can use groupby.cumcount to deduplicate the TaskIDs and merge in order:

df1.merge(df2, left_on=['TaskID', df1.groupby('TaskID').cumcount()],
               right_on=['TaskID', df2.groupby('TaskID').cumcount()])

Output:

    TaskID  key_1 task duration  gender
0  task 45      0        2 mins    male
1  task 45      1        5 mins    male
2  task 45      2        7 mins    male
3  task 67      0        8 mins  female
4  task 67      1        9 mins  female

If the gender is unique per TaskID, you can update in place with:

df1['gender'] = df1['TaskID'].map(df2.drop_duplicates('TaskID').set_index('TaskID')['gender'])

dataframe

byMR

Published November 09, 2022

Add a comment

Display an element after a block line brea

byMR

November 9, 2022

Questions

What is await with question mark (await?) in Rust?

byMR

November 9, 2022

Questions

Pandas: splitting a string and returning multiple elements does not work – only one element is returned

byMR

November 9, 2022

Questions

Is there any command to create new columns and include the characters?

byMR

November 9, 2022

Questions

How to put information from input into a variable?

byMR

November 9, 2022

Questions

Why do I not have the option to open my Android app?

byMR

November 9, 2022

Merging two data frames based on a common column with repeated values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Display an element after a block line brea

What is await with question mark (await?) in Rust?

Pandas: splitting a string and returning multiple elements does not work – only one element is returned

Is there any command to create new columns and include the characters?

How to put information from input into a variable?

Why do I not have the option to open my Android app?

Keep Up to Date with the Most Important News

Merging two data frames based on a common column with repeated values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Display an element after a block line brea

What is await with question mark (await?) in Rust?

Pandas: splitting a string and returning multiple elements does not work – only one element is returned

Is there any command to create new columns and include the characters?

How to put information from input into a variable?

Why do I not have the option to open my Android app?

Discover more from Dev solutions