Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge three different dataframes in Python

I want to merge three data frames in Python, the code I have now provide me with some wrong outputs.

This is the first data frame

 df_1
   Year    Month        X_1    Y_1
 0  2021    January     $90     $100
 1  2021    February    NaN     $120
 2  2021    March       $100    $130
 3  2021    April       $110    $140
 4  2021    May         Nan     $150
 5  2019    June        $120    $160
   

This is the second data frame

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  df_2
    Year    Month       X_2     Y_2
  0 2021    January     Nan     $120
  1 2021    February    NaN     $130
  2 2021    March       $80     $140
  3 2021    April       $90     $150
  4 2021    May         Nan     $150
  5 2021    June        $120    $170

This is the third data frame

   df_3
     Year    Month      X_3     Y_3
  0 2021    January     $110    $150
  1 2021    February    $140    $160
  2 2021    March       $97     $170
  3 2021    April       $90     $180
  4 2021    May         Nan     $190
  5 2021    June        $120    $200

The idea is to combine them into one data frame like this:

 df_combined
    Year    Month       X_1    Y_1    X_2    Y_2    X_3    Y_3
0   2019    January     $90     $100  NaN    $120   $110    $150 
1   2019    February    NaN     $120  NaN    $130   $140    $160
2   2019    March       $100    $130  $80    $140   $97     $170
3   2019    April       $110    $140  $90    $150   $90     $180
4   2019    May         Nan     $150  Nan    $150   Nan     $190
5   2019    June        $120    $160  $120   $170   $120    $200

The code I have for now does not give me the correct outcome, only df_3 has to the correct numbers.

 # compile the list of data frames you want to merge
 import functools as ft
 from functools import reduce
    
 data_frames = [df_1, df_2, df_3]
 df_merged = reduce(lambda  cross, right: pd.merge(cross,right,on=['Year'],
                                                           how='outer'),data_frames)
  #remove superfluous columns
  df_merged.drop(['Month_x', 'Month_y'], axis=1, inplace=True)

>Solution :

One option of probably many is to do

from functools import reduce
import pandas as pd

idx = ["Year", "Month"]
new_df = reduce(pd.DataFrame.join, (i.set_index(idx) for i in dataframes)).reset_index()

or

reduce(lambda x, y: pd.merge(x, y, how="outer", on=["Year", "Month"]), dataframes)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading