I am trying to merge two dataframes such that i end up with one with same number of columns but but an increased row count.
import pandas as pd, numpy as np
data1 = [['date' , 'symbol', 'value'],
['1999-01-10', 'AAA', 101],
['1999-01-11', 'AAA', 201]]
I am trying to merge two dataframes such that i end up with one with same number of columns but row count should increase
import pandas as pd, numpy as np
data1 = [['date' , 'symbol', 'value'],
['1999-01-10', 'AAA', 101],
['1999-01-11', 'AAA', 201]]
data2 = [['date' , 'symbol', 'value'],
['1999-01-10', 'BBB', 101],
['1999-01-11', 'BBB', 201]]
df1 = pd.DataFrame(data1[1:], columns=data1[:1])
df2 = pd.DataFrame(data2[1:], columns=data2[:1])
df = df1.merge(df2, on = ['date', 'symbol'], how='outer')
The code above produces an error on the merge line:
ValueError: The column label 'date' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
I know i can achieve what i am seeking with pd.CONCAT in the above case, but i want to understand why merge is failing here given that the composite keys of date+symbol are different/unique?
Furthermore i dont understand the part about multi-index. there is no index except the ‘natural’ one on these dataframes.
>Solution :
The issue is how you create the df1/df2:
df1 = pd.DataFrame(data1[1:], columns=data1[0]) # <-- columns should be list, not list of lists
df2 = pd.DataFrame(data2[1:], columns=data2[0]) # <-- detto
Then:
df = df1.merge(df2, on=["date", "symbol"], how="outer")
print(df)
Prints:
date symbol value_x value_y
0 1999-01-10 AAA 101.0 NaN
1 1999-01-10 BBB NaN 101.0
2 1999-01-11 AAA 201.0 NaN
3 1999-01-11 BBB NaN 201.0