how to merge the multiple dataframes sequentially?

Although I thought this question should be duplicated, I couldn’t find the proper answer.

I have some problems merging multiple dataframes sequentially.

For example, I have four dataframes as below:

df1 = pd.DataFrame({'source': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
       'target': ['1', '2', '3', '4', '5', '6', '7']})
df2 = pd.DataFrame({'source': ['A', 'A'],
       'temp': ['a', 'b']})
df3 = pd.DataFrame({'source': ['B', 'B'],
       'temp': ['c', 'd']})
df4 = pd.DataFrame({'source': ['C'],
       'temp': ['e']})

And I’d like to merge the dataframe as below:

#   source  target  temp
#0  A   1   a
#1  A   1   b
#2  A   2   a
#3  A   2   b
#4  A   3   a
#5  A   3   b
#6  B   4   c
#7  B   4   d
#8  B   5   c
#9  B   5   d
#10 C   6   e
#11 C   7   e

To do so, I tried to run the code, but it returned unexpected results.

#Trial 1
dfs = pd.merge(df1, df2, on='source', how='left')
dfs = pd.merge(dfs, df3, on='source', how='left') # new column was created with prefix, but I want to keep the three columns; source, target, temp

#Trial 2
dfs = pd.merge(df1, df2, on='source', how='left')
dfs['temp']=dfs.set_index('source')['temp'].fillna(df3.set_index('source')['temp'].to_dict()).values # it only fills the fixed number of NaN value, but there are some exception; one NaN in dfs, multiple values in other df3 or df4

#Trial 3
dfs = pd.merge(df1, df2, on='source', how='left')
dfs[dfs['source']=='B']['temp']=pd.merge(df1, df3, on='source', how='left')['temp'].dropna() # it didn't change the dfs

>Solution :

This is not a simple merge. You want to concat the df2,df3,df4, then merge with df1:

df1.merge(pd.concat([df2,df3,df4]).drop_duplicates(), on='source')

Output:

   source target temp
0       A      1    a
1       A      1    b
2       A      2    a
3       A      2    b
4       A      3    a
5       A      3    b
6       B      4    c
7       B      4    d
8       B      5    c
9       B      5    d
10      C      6    e
11      C      7    e

Leave a Reply