Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging two data frames based on a common column with repeated values

I have 2 data frames with different lengths:
len(df1) 10104
len(df2) 15560

I want to merge these based on a common column (taskID) on both data frames. The Task ID has repeated IDs each ID represent an item belongs to the same task.

example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1:

TaskID task duration
task 45 2 mins
task 45 5 mins
task 45 7 mins
task 67 8 mins
task 67 9 mins

df2:

TaskID gender
task 45 male
task 45 male
task 45 male
task 67 female
task 67 female

when I use Pandas merge function I get an output with a length of 471437 which is greater than the length of both data frames.

I think this is because of the repeated values of TaskID, and I can’t remove the duplicates as each row represent different item.

I tried:

merged=pd.merge(df1,df2, on=’TaskID’, how=’inner’)

I get an output with a length of 471437

Can you please help with this issue ?

>Solution :

You can use groupby.cumcount to deduplicate the TaskIDs and merge in order:

df1.merge(df2, left_on=['TaskID', df1.groupby('TaskID').cumcount()],
               right_on=['TaskID', df2.groupby('TaskID').cumcount()])

Output:

    TaskID  key_1 task duration  gender
0  task 45      0        2 mins    male
1  task 45      1        5 mins    male
2  task 45      2        7 mins    male
3  task 67      0        8 mins  female
4  task 67      1        9 mins  female

If the gender is unique per TaskID, you can update in place with:

df1['gender'] = df1['TaskID'].map(df2.drop_duplicates('TaskID').set_index('TaskID')['gender'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading