Although I thought this question should be duplicated, I couldn’t find the proper answer.
I have some problems merging multiple dataframes sequentially.
For example, I have four dataframes as below:
df1 = pd.DataFrame({'source': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
'target': ['1', '2', '3', '4', '5', '6', '7']})
df2 = pd.DataFrame({'source': ['A', 'A'],
'temp': ['a', 'b']})
df3 = pd.DataFrame({'source': ['B', 'B'],
'temp': ['c', 'd']})
df4 = pd.DataFrame({'source': ['C'],
'temp': ['e']})
And I’d like to merge the dataframe as below:
# source target temp
#0 A 1 a
#1 A 1 b
#2 A 2 a
#3 A 2 b
#4 A 3 a
#5 A 3 b
#6 B 4 c
#7 B 4 d
#8 B 5 c
#9 B 5 d
#10 C 6 e
#11 C 7 e
To do so, I tried to run the code, but it returned unexpected results.
#Trial 1
dfs = pd.merge(df1, df2, on='source', how='left')
dfs = pd.merge(dfs, df3, on='source', how='left') # new column was created with prefix, but I want to keep the three columns; source, target, temp
#Trial 2
dfs = pd.merge(df1, df2, on='source', how='left')
dfs['temp']=dfs.set_index('source')['temp'].fillna(df3.set_index('source')['temp'].to_dict()).values # it only fills the fixed number of NaN value, but there are some exception; one NaN in dfs, multiple values in other df3 or df4
#Trial 3
dfs = pd.merge(df1, df2, on='source', how='left')
dfs[dfs['source']=='B']['temp']=pd.merge(df1, df3, on='source', how='left')['temp'].dropna() # it didn't change the dfs
>Solution :
This is not a simple merge. You want to concat the df2,df3,df4, then merge with df1:
df1.merge(pd.concat([df2,df3,df4]).drop_duplicates(), on='source')
Output:
source target temp
0 A 1 a
1 A 1 b
2 A 2 a
3 A 2 b
4 A 3 a
5 A 3 b
6 B 4 c
7 B 4 d
8 B 5 c
9 B 5 d
10 C 6 e
11 C 7 e