Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to merge the multiple dataframes sequentially?

Although I thought this question should be duplicated, I couldn’t find the proper answer.

I have some problems merging multiple dataframes sequentially.

For example, I have four dataframes as below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 = pd.DataFrame({'source': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
       'target': ['1', '2', '3', '4', '5', '6', '7']})
df2 = pd.DataFrame({'source': ['A', 'A'],
       'temp': ['a', 'b']})
df3 = pd.DataFrame({'source': ['B', 'B'],
       'temp': ['c', 'd']})
df4 = pd.DataFrame({'source': ['C'],
       'temp': ['e']})

And I’d like to merge the dataframe as below:

#   source  target  temp
#0  A   1   a
#1  A   1   b
#2  A   2   a
#3  A   2   b
#4  A   3   a
#5  A   3   b
#6  B   4   c
#7  B   4   d
#8  B   5   c
#9  B   5   d
#10 C   6   e
#11 C   7   e

To do so, I tried to run the code, but it returned unexpected results.

#Trial 1
dfs = pd.merge(df1, df2, on='source', how='left')
dfs = pd.merge(dfs, df3, on='source', how='left') # new column was created with prefix, but I want to keep the three columns; source, target, temp

#Trial 2
dfs = pd.merge(df1, df2, on='source', how='left')
dfs['temp']=dfs.set_index('source')['temp'].fillna(df3.set_index('source')['temp'].to_dict()).values # it only fills the fixed number of NaN value, but there are some exception; one NaN in dfs, multiple values in other df3 or df4

#Trial 3
dfs = pd.merge(df1, df2, on='source', how='left')
dfs[dfs['source']=='B']['temp']=pd.merge(df1, df3, on='source', how='left')['temp'].dropna() # it didn't change the dfs

>Solution :

This is not a simple merge. You want to concat the df2,df3,df4, then merge with df1:

df1.merge(pd.concat([df2,df3,df4]).drop_duplicates(), on='source')

Output:

   source target temp
0       A      1    a
1       A      1    b
2       A      2    a
3       A      2    b
4       A      3    a
5       A      3    b
6       B      4    c
7       B      4    d
8       B      5    c
9       B      5    d
10      C      6    e
11      C      7    e
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading