Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge pandas dataframes with multiindex columns

How can I merge two dataframes in pandas that both have multi-index columns. The merge should be an outer merge on a column that is present in both dataframes. See minimal example and resulting error below:

import pandas as pd

# Sample DataFrames with multi-index columns
data1 = {
    ('A', 'X'): [1, 2, 3],
    ('A', 'Y'): [4, 5, 6],
    ('B', 'X'): [7, 8, 9],
    ('B', 'Y'): [10, 11, 12],
}

data2 = {
    ('A', 'X'): [13, 14, 15],
    ('A', 'Y'): [16, 17, 18],
    ('B', 'X'): [19, 20, 21],
    ('B', 'Y'): [22, 23, 24],
}

df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3'])
df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3'])

# Merge on a specific column (e.g., ('A', 'X'))
column_to_merge_on = ('A', 'X')
merged_df = pd.merge(df1, df2, left_on=column_to_merge_on, right_on=column_to_merge_on)

print(merged_df)

Resulting error:

ValueError: The column label 'A' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

For me working add list like [('A', 'X')]:

#changed data for match
data1 = {
    ('A', 'X'): [1, 2, 13],
    ('A', 'Y'): [4, 5, 6],
    ('B', 'X'): [7, 8, 9],
    ('B', 'Y'): [10, 11, 12],
}

data2 = {
    ('A', 'X'): [13, 14, 15],
    ('A', 'Y'): [16, 17, 18],
    ('B', 'X'): [19, 20, 21],
    ('B', 'Y'): [22, 23, 24],
}

df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3'])
df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3'])


# Merge on a specific column (e.g., ('A', 'X'))
column_to_merge_on = [('A', 'X')]
merged_df = pd.merge(df1, df2, 
                     left_on=column_to_merge_on, 
                     right_on=column_to_merge_on, how='outer')

print(merged_df)
    A   A_x       B_x         A_y   B_y      
    X     X    Y    X     Y     Y     X     Y
0   1   1.0  4.0  7.0  10.0   NaN   NaN   NaN
1   2   2.0  5.0  8.0  11.0   NaN   NaN   NaN
2  13  13.0  6.0  9.0  12.0  16.0  19.0  22.0
3  14   NaN  NaN  NaN   NaN  17.0  20.0  23.0
4  15   NaN  NaN  NaN   NaN  18.0  21.0  24.0
    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading