Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Pandas DataFrame Merge on Columns with Overwrite

Is there a way to merge two Pandas DataFrames, by matching on (and retaining) supplied columns, but overwriting all the rest?

For example:

import pandas as pd

df1 = pd.DataFrame(columns=["Name", "Gender", "Age", "LastLogin", "LastPurchase"])
df1.loc[0] = ["Bob", "Male", "21", "2023-01-01", "2023-01-01"]
df1.loc[1] = ["Frank", "Male", "22", "2023-02-01", "2023-02-01"]
df1.loc[2] = ["Steve", "Male", "23", "2023-03-01", "2023-03-01"]
df1.loc[3] = ["John", "Male", "24", "2023-04-01", "2023-04-01"]

df2 = pd.DataFrame(columns=["Name", "Gender", "Age", "LastLogin", "LastPurchase"])
df2.loc[0] = ["Steve", "Male", "23", "2022-11-01", "2022-11-02"]
df2.loc[1] = ["Simon", "Male", "23", "2023-03-01", "2023-03-02"]
df2.loc[2] = ["Gary", "Male", "24", "2023-04-01", "2023-04-02"]
df2.loc[3] = ["Bob", "Male", "21", "2022-12-01", "2022-12-01"]

>>> df1
    Name Gender Age   LastLogin LastPurchase
0    Bob   Male  21  2023-01-01   2023-01-01
1  Frank   Male  22  2023-02-01   2023-02-01
2  Steve   Male  23  2023-03-01   2023-03-01
3   John   Male  24  2023-04-01   2023-04-01

>>> df2
    Name Gender Age   LastLogin LastPurchase
0  Steve   Male  23  2022-11-01   2022-11-02
1  Simon   Male  23  2023-03-01   2023-03-02
2   Gary   Male  24  2023-04-01   2023-04-02
3    Bob   Male  21  2022-12-01   2022-12-01

What I’d like is to end up with is df1 updated with values from df2, if the "Name", "Gender" and "Age" columns match. But without caring what the other columns are, so I’d end up with this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> df1
    Name Gender Age   LastLogin LastPurchase
0    Bob   Male  21  2022-12-01   2022-12-01 # Updated last two columns from df2
1  Frank   Male  22  2023-02-01   2023-02-01
2  Steve   Male  23  2022-11-01   2022-11-02 # Updated last two columns from df2
3   John   Male  24  2023-04-01   2023-04-01

I can do a merge like this:

>>> df3 = df1.merge(df2, on=["Name", "Gender", "Age"], how='left')

But then I have to manually extract data from and drop the new columns created from the merge, using their names:

>>> df3['LastLogin'] = df3['LastLogin_y'].fillna(df3['LastLogin_x'])
>>> df3['LastPurchase'] = df3['LastPurchase_y'].fillna(df3['LastPurchase_x'])
>>> df3.drop(['LastLogin_x', 'LastLogin_y'], axis=1, inplace=True)
>>> df3.drop(['LastPurchase_x', 'LastPurchase_y'], axis=1, inplace=True)
>>> 
>>> df3
    Name Gender Age   LastLogin LastPurchase
0    Bob   Male  21  2022-12-01   2022-12-01
1  Frank   Male  22  2023-02-01   2023-02-01
2  Steve   Male  23  2022-11-01   2022-11-02
3   John   Male  24  2023-04-01   2023-04-01

I’m trying to avoid this, as I need a generic way to update batches of data, and I don’t know all their column names (just the ones I want to match on).

>Solution :

You can avoid the _x/_y columns by slicing only the the merging keys in df1 for merge, then fillna/combine_first with the original:

cols = ["Name", "Gender", "Age"]

df3 = df1[cols].merge(df2, how='left').fillna(df1)

A more convoluted approach using indexes:

df3 = (df2.set_index(cols)
          .combine_first(df1.set_index(cols))
          .reindex(df1[cols]).reset_index()
       )

Output:

    Name Gender Age   LastLogin LastPurchase
0    Bob   Male  21  2022-12-01   2022-12-01
1  Frank   Male  22  2023-02-01   2023-02-01
2  Steve   Male  23  2022-11-01   2022-11-02
3   John   Male  24  2023-04-01   2023-04-01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading