Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Perform merge for specific duplicate rows in pandas DataFrame

Let’s be the following two DataFrames in python:

df:

code_1 other
19001 white
19009 blue
19008 red

df_1:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

code_1 code_2
19001 00001
19001 00002
19009 00003
19008 00001

I want to merge df with df_1:

    df_merge = pd.merge(df, df_1, how="left", on=['code_1'])

df_merge:

code_1 other code_2
19001 white 00001
19001 white 00002
19009 blue 00003
19008 red 00004

I want the merge to remove duplicates in the case of code_1 and only do the merge for the first row. I could do a drop_duplicates for [other, code_1], but I would like to know if it is possible to include some parameter in the merge function to do it directly.

Expected result:

code_1 other code_2
19001 white 00001
19009 blue 00003
19008 red 00004

>Solution :

In my opinion there is no specifc parameter for pandas.merge() that fit your needs, but you could reduce the result by dropping duplicates before merging, assumed there are only duplicates in df_1:

df_merge = df.merge(df_1.drop_duplicates('code_1'), how="left", on=['code_1'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading