So I have two tables I’d like to merge. However, the column in the second table has references to two columns of the first table, e.g.:
Table 1 python:
table1 = pd.DataFrame({
'Key1': ['Value1', 'Value2', 'Value3'],
'Key2': ['Apples', 'Pears', 'Oranges'],
'Text': ['Hello', 'World', '!'],
'Values': [5, 8, 12],
})
Table 1:
╔════════╦═════════╦═══════╦════════╗
║ Key1 ║ Key2 ║ Text ║ Values ║
╠════════╬═════════╬═══════╬════════╣
║ Value1 ║ Apples ║ Hello ║ 5 ║
║ Value2 ║ Pears ║ World ║ 8 ║
║ Value3 ║ Oranges ║ ! ║ 12 ║
╚════════╩═════════╩═══════╩════════╝
Table 2 python:
table2 = pd.DataFrame({'Key1_Key2': ['Value2', 'Apples', 'Oranges'],
'Value_to_Extract': ['Value 123', 'Value 88',
'Value 99']})
Table 2:
╔═══════════╦══════════════════╗
║ Key1_Key2 ║ Value_to_Extract ║
╠═══════════╬══════════════════╣
║ Value2 ║ Value 123 ║
║ Apples ║ Value 88 ║
║ Oranges ║ Value 99 ║
╚═══════════╩══════════════════╝
I first would like to merge based on Key1
values, and only if that fails, then on Key2
values. It should be an exact match, so I am not looking to concatenate Key1 and Key2 and then search for a substring because then if there are values that look alike (e.g. ‘Apples’ and ‘Apple’) then it will wreak havoc.
Result should look like this:
╔════════╦═════════╦═══════╦════════╦══════════════════╗
║ Key1 ║ Key2 ║ Text ║ Values ║ Value_To_Extract ║
╠════════╬═════════╬═══════╬════════╬══════════════════╣
║ Value1 ║ Apples ║ Hello ║ 5 ║ Value 88 ║
║ Value2 ║ Pears ║ World ║ 8 ║ Value 123 ║
║ Value3 ║ Oranges ║ ! ║ 12 ║ Value 99 ║
╚════════╩═════════╩═══════╩════════╩══════════════════╝
>Solution :
You should perform the two merge
independently, then combine_first
in order of preference to fill the missing values only where the first merge failed:
m1 = table1.merge(table2, left_on='Key1', right_on='Key1_Key2', how='left')
m2 = table1.merge(table2, left_on='Key2', right_on='Key1_Key2', how='left')
# uncomment to remove merge column
m1.combine_first(m2)#.drop(columns='Key1_Key2')
output:
Key1 Key2 Text Values Key1_Key2 Value_to_Extract
0 Value1 Apples Hello 5 Apples Value 88
1 Value2 Pears World 8 Value2 Value 123
2 Value3 Oranges ! 12 Oranges Value 99`