I have two Dataframes with the same multi-index format & names. I am trying to perform a merge of the two Dataframe so that in the innermost index (‘Model’) the values from each Dataframe are combined but only have one ‘units sold’ column given that the model values in each Dataframe are unique. The code below gives me two columns one each for the left and right Dataframe. I just want one column.
german_cars_sold = pd.MultiIndex.from_product([['Tom','Jerry'], ['2022', '2023'], ['Q1', 'Q2', 'Q3', 'Q4'], ['Porsche', 'Audi', 'Benz']],
names=['SalesPerson', 'Year', 'Quarter', 'Model'])
columns = ['units sold']
german = pd.DataFrame(np.arange(48).reshape((len(german_cars_sold), len(columns))),
index=german_cars_sold, columns=columns)
US_cars_sold = pd.MultiIndex.from_product([['Tom','Jerry'], ['2022', '2023'], ['Q1', 'Q2', 'Q3', 'Q4'], ['Telsa', 'Ford', 'Jeep']],
names=['SalesPerson', 'Year', 'Quarter', 'Model'])
columns = ['units sold']
US = pd.DataFrame(np.arange(48).reshape((len(US_cars_sold), len(columns))),
index=US_cars_sold, columns=columns)
combined = german.merge(US, on=['SalesPerson', 'Year', 'Quarter', 'Model'], how='outer')
>Solution :
I believe what you want is not a merge but to combine the two DataFrames with a concat:
combined = pd.concat([german, US]).sort_index()
Which could also be done with an addition (add):
combined = german.add(US, fill_value=0).convert_dtypes()
Note that the two approaches will give the same output since you do not have overlapping indices. If you did, concat would give you two rows for the overlapping indices while add would give the sum of the identical indices.
Output:
units sold
SalesPerson Year Quarter Model
Jerry 2022 Q1 Audi 25
Benz 26
Ford 25
Jeep 26
Porsche 24
... ...
Tom 2023 Q4 Benz 23
Ford 22
Jeep 23
Porsche 21
Telsa 21
[96 rows x 1 columns]