Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use an alternative key if key missing in pandas merge

I would like to merge two dataframes df1 and df2, with the condition that if a value of the right-side in key2 is not present on the left-side in key1, than the alternative_key is used. Is there any way to do that in a nice way?

a = {'key1': ['a','b','c'], 'alternative_key':['f','g','h']}
df1 = pd.DataFrame(data=a)


b = {'key2':['a','b','h'], 'some_stuff': [1,2,3]}
df2 = pd.DataFrame(data=b)

df_final=df1.merge(df2, left_on='key1', right_on='key2', how='left')

expected result in df_final:

  key1 alternative_key  some_stuff
0    a               f           1
1    b               g           2
2    c               h           3

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Perform both merges and combine_first in the desired order of priority:

(df1.merge(df2, left_on='key1', right_on='key2', how='left')
    .combine_first(df1.merge(df2, left_on='alternative_key',
                             right_on='key2', how='left'))
    .drop(columns='key2')
)

Alternative ways of writing the code (using variables):

merge1 = df1.merge(df2, left_on='key1', right_on='key2', how='left')
merge2 = df1.merge(df2, left_on='alternative_key', right_on='key2', how='left')

out = merge1.combine_first(merge2).drop(columns='key2')

output:

  key1 alternative_key  some_stuff
0    a               f         1.0
1    b               g         2.0
2    c               h         3.0

NB. the advantage of this method is that you can have as many other columns as you want. All will be merged in order.

arbitrary number of alternative keys:

from functools import reduce

keys = ['key1', 'alternative_key']

out = (reduce(lambda a,b: a.combine_first(b),
              [df1.merge(df2, left_on=k, right_on='key2', how='left')
               for k in keys])
       .drop(columns='key2')
      )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading