Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine two dataframes with scalable columns by placing the second dataframe directly below the first dataframe without deleting the keys

Here are my 2 dataframes that I have:

Code:

import pandas as pd

data1 = {'Key1': ['Value1'], 'Key2': ['Value2'], 'Key3': ['Value3'], 
         'Key4': ['Value4'], 'Key5': ['Value5']}

data2 = {'Key1': ['Value1', 'Value4'], 
         'Key4': ['Value2', 'Value5'], 
         'Key7': ['Value3', 'Value6']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

Dataframe 1:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Key1    Key2    Key3    Key4    Key5
Value1  Value2  Value3  Value4  Value5

Dataframe 2:

Key1    Key4    Key7
Value1  Value2  Value3
Value4  Value5  Value6

Expected Results:

Key1    Key2    Key3    Key4    Key5
Value1  Value2  Value3  Value4  Value5
Key1    Key4    Key7    NaN     NaN
Value1  Value2  Value3  NaN     NaN
Value4  Value5  Value6  NaN     NaN

How can I join both of these dataframes without merging the keys at all?

>Solution :

Assuming your input DataFrames have Key1/Key2/… as column headers, you can first move those to rows, then concat.

For that, two options, transpose and reset_index (which can be problemantic depending on the data types):

dfs = [df1, df2]

out = pd.concat([d.T.reset_index().T for d in dfs], ignore_index=True)

Or use a double concat:

dfs = [df1, df2]

out = pd.concat([pd.concat([pd.DataFrame([d.columns]),
                            pd.DataFrame(d.to_numpy())])
                 for d in dfs],
                ignore_index=True)

Reproducible input:

df1 = pd.DataFrame({'Key1': ['Value1'], 'Key2': ['Value2'], 'Key3': ['Value3'],
                    'Key4': ['Value4'], 'Key5': ['Value5']})
df2 = pd.DataFrame({'Key1': ['Value1', 'Value4'],
                    'Key2': ['Value2', 'Value5'],
                    'Key3': ['Value3', 'Value6']})

Output:

        0       1       2       3       4
0    Key1    Key2    Key3    Key4    Key5
1  Value1  Value2  Value3  Value4  Value5
2    Key1    Key2    Key3     NaN     NaN
3  Value1  Value2  Value3     NaN     NaN
4  Value4  Value5  Value6     NaN     NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading