Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas merge complaining about non unique labels when key is a composite and unique

I am trying to merge two dataframes such that i end up with one with same number of columns but but an increased row count.

import pandas as pd, numpy as np

data1 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'AAA',      101],
         ['1999-01-11', 'AAA',      201]]
I am trying to merge two dataframes such that i end up with one with same number of columns but row count should increase


import pandas as pd, numpy as np

data1 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'AAA',      101],
         ['1999-01-11', 'AAA',      201]]

data2 = [['date'  ,    'symbol', 'value'],
         ['1999-01-10', 'BBB',      101],
         ['1999-01-11', 'BBB',      201]]


df1 = pd.DataFrame(data1[1:], columns=data1[:1])
df2 = pd.DataFrame(data2[1:], columns=data2[:1])
df = df1.merge(df2, on = ['date', 'symbol'], how='outer')

The code above produces an error on the merge line:

ValueError: The column label 'date' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.

I know i can achieve what i am seeking with pd.CONCAT in the above case, but i want to understand why merge is failing here given that the composite keys of date+symbol are different/unique?
Furthermore i dont understand the part about multi-index. there is no index except the ‘natural’ one on these dataframes.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The issue is how you create the df1/df2:

df1 = pd.DataFrame(data1[1:], columns=data1[0])  # <-- columns should be list, not list of lists
df2 = pd.DataFrame(data2[1:], columns=data2[0])  # <-- detto

Then:

df = df1.merge(df2, on=["date", "symbol"], how="outer")
print(df)

Prints:

         date symbol  value_x  value_y
0  1999-01-10    AAA    101.0      NaN
1  1999-01-10    BBB      NaN    101.0
2  1999-01-11    AAA    201.0      NaN
3  1999-01-11    BBB      NaN    201.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading