Pandas merge two dataframes horizontally

December 7, 2022

I am dealing with a problem like this:

mydata_old
|  x  |  y   |
|-----|------|
| 1   | 10   |
| 2   | 15   |
| 3   | 12   |
| 4   | 21   |

mydata_new
|  x  |  y   |
|-----|------|
| 4   | 20   |
| 5   | 25   |
| 6   | 30   |

I would like to combine both tables to get:

mydata
|  x  |  y   |
|-----|------|
| 1   | 10   |
| 2   | 15   |
| 3   | 12   |
| 4   | 20   | # x = 4 does not show up twice
| 5   | 25   |
| 6   | 30   |

I tried with pandas.merge, but I cannot get the desired result.

mydata = pd.merge(mydata_new, mydata_old, how="left")

>Solution :

One way to do this is with an outer join (i.e., keep the index from both dataframes). You can achieve this using pd.merge(mydata_new, mydata_old, how='outer') and then keep the newest data available.

However, this could be more straight forward with pandas.concat.

Here, I am concatenating both dataframes. Note that I exclude rows from mydata_old if x appears in mydata_new. This way you can keep the newest data when it’s available in both dataframes (old and new).

# Concat frames and if data is available in both, keep data from mydata_new
mydata = pd.concat(
    [
        mydata_old.loc[~mydata_old['x'].isin(mydata_new['x'])],
        mydata_new
    ],
    axis=0)

This will print the following: