Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Concatenate pandas DataFrames on columns, similar to outer merge

I have 3 dataframes with dates on the first column of each. I would like to concat these dataframes but concating related with the row value of each. If the values match, add on the same row, otherwise, I would expect to have a NaN.

import numpy as np
import pandas as pd

# Create the pandas DataFrame
df1 = pd.DataFrame(['2018-12-31','2019-09-30','2022-01-31'], columns = ['Date1'])
df2 = pd.DataFrame(['2019-09-30','2022-02-28'], columns = ['Date2'])
df3 = pd.DataFrame(['2019-09-30','2021-06-30','2021-11-30','2022-03-31'], columns = ['Date3'])

display(df1)
display(df2)
display(df3)

data = {'Date1': ['2018-12-31','2019-09-30',np.nan,np.nan,'2022-01-31',np.nan,np.nan],
        'Date2': [np.nan,'2019-09-30',np.nan,np.nan,np.nan,'2022-02-28',np.nan],
        'Date3': [np.nan,'2019-09-30','2021-06-30','2021-11-30',np.nan,np.nan,'2022-01-31']}

desired_df = pd.DataFrame(data)
desired_df

This is what I am trying to achieve.

Date1 Date2 Date3
0 2018-12-31 NaN NaN
1 2019-09-30 2019-09-30 2019-09-30
2 NaN NaN 2021-06-30
3 NaN NaN 2021-11-30
4 2022-01-31 NaN NaN
5 NaN 2022-02-28 NaN
6 NaN NaN 2022-01-31

My original idea was to used something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pd.concat([df1,df2,df3], axis=1, join="outer")

However, above will produce something like:

Date1 Date2 Date3
2018-12-31 2019-09-30 2019-09-30
2019-09-30 2022-02-28 2021-06-30
2022-01-31 NaN 2021-11-30
NaN NaN 2022-03-31

>Solution :

We could set_index with the Dates (by setting the drop parameter to False, we don’t lose the column), then concat horizontally:

out = (pd.concat([df.set_index(f'Date{i+1}', drop=False) 
                 for i, df in enumerate([df1, df2, df3])], axis=1)
       .sort_index().reset_index(drop=True))

Output:

        Date1       Date2       Date3
0  2018-12-31         NaN         NaN
1  2019-09-30  2019-09-30  2019-09-30
2         NaN         NaN  2021-06-30
3         NaN         NaN  2021-11-30
4  2022-01-31         NaN         NaN
5         NaN  2022-02-28         NaN
6         NaN         NaN  2022-03-31
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading