Home Pandas merge complaining about non unique labels when key is a composite and unique

Questions

Pandas merge complaining about non unique labels when key is a composite and unique

April 25, 2024

I am trying to merge two dataframes such that i end up with one with same number of columns but but an increased row count.

import pandas as pd, numpy as np

data1 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'AAA',      101],
         ['1999-01-11', 'AAA',      201]]
I am trying to merge two dataframes such that i end up with one with same number of columns but row count should increase


import pandas as pd, numpy as np

data1 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'AAA',      101],
         ['1999-01-11', 'AAA',      201]]

data2 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'BBB',      101],
         ['1999-01-11', 'BBB',      201]]


df1 = pd.DataFrame(data1[1:], columns=data1[:1])
df2 = pd.DataFrame(data2[1:], columns=data2[:1])
df = df1.merge(df2, on = ['date', 'symbol'], how='outer')

The code above produces an error on the merge line:

ValueError: The column label 'date' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

I know i can achieve what i am seeking with pd.CONCAT in the above case, but i want to understand why merge is failing here given that the composite keys of date+symbol are different/unique?
Furthermore i dont understand the part about multi-index. there is no index except the ‘natural’ one on these dataframes.

>Solution :

The issue is how you create the df1/df2:

df1 = pd.DataFrame(data1[1:], columns=data1[0])  # <-- columns should be list, not list of lists
df2 = pd.DataFrame(data2[1:], columns=data2[0])  # <-- detto

Then:

df = df1.merge(df2, on=["date", "symbol"], how="outer")
print(df)

Prints:

         date symbol  value_x  value_y
0  1999-01-10    AAA    101.0      NaN
1  1999-01-10    BBB      NaN    101.0
2  1999-01-11    AAA    201.0      NaN
3  1999-01-11    BBB      NaN    201.0