Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas merge on column 2 if column 1 did not produce a match

So I have two tables I’d like to merge. However, the column in the second table has references to two columns of the first table, e.g.:

Table 1 python:

table1 = pd.DataFrame({
    'Key1': ['Value1', 'Value2', 'Value3'],
    'Key2': ['Apples', 'Pears', 'Oranges'],
    'Text': ['Hello', 'World', '!'],
    'Values': [5, 8, 12],
    })

Table 1:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

╔════════╦═════════╦═══════╦════════╗
║  Key1  ║  Key2   ║ Text  ║ Values ║
╠════════╬═════════╬═══════╬════════╣
║ Value1 ║ Apples  ║ Hello ║      5 ║
║ Value2 ║ Pears   ║ World ║      8 ║
║ Value3 ║ Oranges ║ !     ║     12 ║
╚════════╩═════════╩═══════╩════════╝

Table 2 python:

table2 = pd.DataFrame({'Key1_Key2': ['Value2', 'Apples', 'Oranges'],
                          'Value_to_Extract': ['Value 123', 'Value 88',
                          'Value 99']})

Table 2:

╔═══════════╦══════════════════╗
║ Key1_Key2 ║ Value_to_Extract ║
╠═══════════╬══════════════════╣
║ Value2    ║ Value 123        ║
║ Apples    ║ Value 88         ║
║ Oranges   ║ Value 99         ║
╚═══════════╩══════════════════╝

I first would like to merge based on Key1 values, and only if that fails, then on Key2 values. It should be an exact match, so I am not looking to concatenate Key1 and Key2 and then search for a substring because then if there are values that look alike (e.g. ‘Apples’ and ‘Apple’) then it will wreak havoc.

Result should look like this:

╔════════╦═════════╦═══════╦════════╦══════════════════╗
║  Key1  ║  Key2   ║ Text  ║ Values ║ Value_To_Extract ║
╠════════╬═════════╬═══════╬════════╬══════════════════╣
║ Value1 ║ Apples  ║ Hello ║      5 ║ Value 88         ║
║ Value2 ║ Pears   ║ World ║      8 ║ Value 123        ║
║ Value3 ║ Oranges ║ !     ║     12 ║ Value 99         ║
╚════════╩═════════╩═══════╩════════╩══════════════════╝

>Solution :

You should perform the two merge independently, then combine_first in order of preference to fill the missing values only where the first merge failed:

m1 = table1.merge(table2, left_on='Key1', right_on='Key1_Key2', how='left')
m2 = table1.merge(table2, left_on='Key2', right_on='Key1_Key2', how='left')

                    # uncomment to remove merge column
m1.combine_first(m2)#.drop(columns='Key1_Key2') 

output:

     Key1     Key2   Text  Values Key1_Key2 Value_to_Extract
0  Value1   Apples  Hello       5    Apples         Value 88
1  Value2    Pears  World       8    Value2        Value 123
2  Value3  Oranges      !      12   Oranges         Value 99`
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading