Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine multiple dataframes which have different column names into a new dataframe while adding new columns

There are multiple Pandas dataframes with one column each and having different column names.

df1 = pd.DataFrame({'ID1':['a1','a2']})
df1:
    ID1
0   a1
1   a2    

df2 = pd.DataFrame({'ID2':['a1','b1']})
df2:
    ID2
0   a1
1   b1

df3 = pd.DataFrame({'ID3':['a2','b1','b2']})
df3:
    ID3
0   a2
1   b1
2   b2

I want to combine these dataframes into one dataframe as below.

    ID1   ID2   ID3
0   a1    a1    NaN
1   a2    NaN   a2
2   NaN   b1    b1
3   NaN   NaN   b2

pd.merge() can be used if there is only two dataframes. But I want to do this for many dataframes. And also I want a separate column for each dataframe with it’s column name.
Is there a way of doing this? Thank you!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can try set the ID column as index and concat them on columns

df = pd.concat([df.set_index(f'ID{i+1}').assign(**{f'ID{i+1}': 1}) for i, df in enumerate([df1, df2, df3])], axis=1)
df = df.apply(lambda col: col.mask(col.eq(1), df.index)).reset_index(drop=True)
print(df)

   ID1  ID2  ID3
0   a1   a1  NaN
1   a2  NaN   a2
2  NaN   b1   b1
3  NaN  NaN   b2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading