There are multiple Pandas dataframes with one column each and having different column names.
df1 = pd.DataFrame({'ID1':['a1','a2']})
df1:
ID1
0 a1
1 a2
df2 = pd.DataFrame({'ID2':['a1','b1']})
df2:
ID2
0 a1
1 b1
df3 = pd.DataFrame({'ID3':['a2','b1','b2']})
df3:
ID3
0 a2
1 b1
2 b2
I want to combine these dataframes into one dataframe as below.
ID1 ID2 ID3
0 a1 a1 NaN
1 a2 NaN a2
2 NaN b1 b1
3 NaN NaN b2
pd.merge() can be used if there is only two dataframes. But I want to do this for many dataframes. And also I want a separate column for each dataframe with it’s column name.
Is there a way of doing this? Thank you!
>Solution :
You can try set the ID column as index and concat them on columns
df = pd.concat([df.set_index(f'ID{i+1}').assign(**{f'ID{i+1}': 1}) for i, df in enumerate([df1, df2, df3])], axis=1)
df = df.apply(lambda col: col.mask(col.eq(1), df.index)).reset_index(drop=True)
print(df)
ID1 ID2 ID3
0 a1 a1 NaN
1 a2 NaN a2
2 NaN b1 b1
3 NaN NaN b2