How can I merge two dataframes in pandas that both have multi-index columns. The merge should be an outer merge on a column that is present in both dataframes. See minimal example and resulting error below:
import pandas as pd
# Sample DataFrames with multi-index columns
data1 = {
('A', 'X'): [1, 2, 3],
('A', 'Y'): [4, 5, 6],
('B', 'X'): [7, 8, 9],
('B', 'Y'): [10, 11, 12],
}
data2 = {
('A', 'X'): [13, 14, 15],
('A', 'Y'): [16, 17, 18],
('B', 'X'): [19, 20, 21],
('B', 'Y'): [22, 23, 24],
}
df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3'])
df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3'])
# Merge on a specific column (e.g., ('A', 'X'))
column_to_merge_on = ('A', 'X')
merged_df = pd.merge(df1, df2, left_on=column_to_merge_on, right_on=column_to_merge_on)
print(merged_df)
Resulting error:
ValueError: The column label 'A' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
>Solution :
For me working add list like [('A', 'X')]:
#changed data for match
data1 = {
('A', 'X'): [1, 2, 13],
('A', 'Y'): [4, 5, 6],
('B', 'X'): [7, 8, 9],
('B', 'Y'): [10, 11, 12],
}
data2 = {
('A', 'X'): [13, 14, 15],
('A', 'Y'): [16, 17, 18],
('B', 'X'): [19, 20, 21],
('B', 'Y'): [22, 23, 24],
}
df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3'])
df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3'])
# Merge on a specific column (e.g., ('A', 'X'))
column_to_merge_on = [('A', 'X')]
merged_df = pd.merge(df1, df2,
left_on=column_to_merge_on,
right_on=column_to_merge_on, how='outer')
print(merged_df)
A A_x B_x A_y B_y
X X Y X Y Y X Y
0 1 1.0 4.0 7.0 10.0 NaN NaN NaN
1 2 2.0 5.0 8.0 11.0 NaN NaN NaN
2 13 13.0 6.0 9.0 12.0 16.0 19.0 22.0
3 14 NaN NaN NaN NaN 17.0 20.0 23.0
4 15 NaN NaN NaN NaN 18.0 21.0 24.0