Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine list of dataframes into one big dataframe avoiding duplicates on columns and indices

Multiple data points are in a list. I want to combine them into one pandas DataFrame. Minimal example:

list_of_frames = [pd.DataFrame({'name':'adam', 'height':'180'}, index=[0]), pd.DataFrame({'name':'adam', 'weight':'80'}, index=[1]), pd.DataFrame({'name':'eve', 'height':'190'}, index=[2])]

How do I obtain the following DataFrame?

    name    height  weight
0   adam    180     80
1   eve     190     NaN

If I call pd.concat(list_of_frames) I obtain a list of entries

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    name    height  weight
0   adam    180     NaN
1   adam    NaN     80
2   eve     190     NaN

Obviously the height variable has been ‘merged’. Can I collapse this DataFrame?

Alternatively I tried reduce(lambda l, r: pd.merge(l, r, on='name', how='outer'), list_of_frames) which leads to

    name    height_x    weight  height_y
0   adam    180     80  NaN
1   eve     NaN     NaN     190

Here we have separate column names. I feel like I am missing something obvious. Thanks for the help!

>Solution :

If you always have single rows DataFrames as input, "name" acts as unique key, use groupby.first:

pd.concat(list_of_frames).groupby('name', as_index=False).first()

Output:

   name height weight
0  adam    180     80
1   eve    190   None
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading