Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Join 2 dataframes based on their differences while keeping common rows and columns python

Context: I’m trying to add data from a new dataframe into an old dataframe where the only differences between them is that the data has been replaced with other information. I’ll use an example to explain myself better:

Let’s say the old dataframe has this structure:

test_old = pd.DataFrame.from_dict({'FactsEN' :['sales','price','promotion','sales','price','promotion'], 'Sales' : [12345,12,11,54321,14,12], 'Type' : ['type1','type1','type1','type2','type2','type2']})


test_new = pd.DataFrame.from_dict({'FactsEN' :['sales','price','new_promotion','sales','price','new_promotion'], 'Sales' : [12345,12,11,54321,14,13], 'Type' : ['type1','type1','type1','type2','type2','type2']})

How can I join both dataframes keeping the common columns and simply adding the differences between them? Ideally the dataframes would go from this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> test_old
     FactsEN  Sales   Type
0      sales  12345  type1
1      price     12  type1
2  promotion     11  type1
3      sales  54321  type2
4      price     14  type2
5  promotion     12  type2

>>> test_new
         FactsEN  Sales   Type
0          sales  12345  type1
1          price     12  type1
2  new_promotion     11  type1
3          sales  54321  type2
4          price     14  type2
5  new_promotion     13  type2

To this:

         FactsEN  Sales   Type
0          sales  12345  type1
1          price     12  type1
2      promotion     11  type1
3  new_promotion     11  type1
4          sales  54321  type2
5          price     14  type2
6      promotion     12  type2
7  new_promotion     13  type2

I was trying concatenate and merges, but none of them managed to get the output…

concatenated_dataframe = pd.merge(old,new, how='outer', left_on=new.columns, right_on=old.columns)

Any help is very welcome!

>Solution :

We can use pandas.DataFrame.merge and join on multiple columns.

pd.merge(test_old, test_new, on=['FactsEN',  'Sales',   'Type'], how="outer")

This gives us the expected data

         FactsEN  Sales   Type
0          sales  12345  type1
1          price     12  type1
2      promotion     11  type1
3          sales  54321  type2
4          price     14  type2
5      promotion     12  type2
6  new_promotion     11  type1
7  new_promotion     13  type2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading