Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Pandas multi index merging with a combined column of data

I have two Dataframes with the same multi-index format & names. I am trying to perform a merge of the two Dataframe so that in the innermost index (‘Model’) the values from each Dataframe are combined but only have one ‘units sold’ column given that the model values in each Dataframe are unique. The code below gives me two columns one each for the left and right Dataframe. I just want one column.

german_cars_sold = pd.MultiIndex.from_product([['Tom','Jerry'], ['2022', '2023'], ['Q1', 'Q2', 'Q3', 'Q4'], ['Porsche', 'Audi', 'Benz']],
                                               names=['SalesPerson', 'Year', 'Quarter', 'Model'])
columns = ['units sold']
german = pd.DataFrame(np.arange(48).reshape((len(german_cars_sold), len(columns))),
                    index=german_cars_sold, columns=columns)

US_cars_sold = pd.MultiIndex.from_product([['Tom','Jerry'], ['2022', '2023'], ['Q1', 'Q2', 'Q3', 'Q4'], ['Telsa', 'Ford', 'Jeep']],
                                          names=['SalesPerson', 'Year', 'Quarter', 'Model'])
columns = ['units sold']
US = pd.DataFrame(np.arange(48).reshape((len(US_cars_sold), len(columns))),
                    index=US_cars_sold, columns=columns)

combined = german.merge(US, on=['SalesPerson', 'Year', 'Quarter', 'Model'], how='outer')

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I believe what you want is not a merge but to combine the two DataFrames with a concat:

combined = pd.concat([german, US]).sort_index()

Which could also be done with an addition (add):

combined = german.add(US, fill_value=0).convert_dtypes()

Note that the two approaches will give the same output since you do not have overlapping indices. If you did, concat would give you two rows for the overlapping indices while add would give the sum of the identical indices.

Output:

                                  units sold
SalesPerson Year Quarter Model              
Jerry       2022 Q1      Audi             25
                         Benz             26
                         Ford             25
                         Jeep             26
                         Porsche          24
...                                      ...
Tom         2023 Q4      Benz             23
                         Ford             22
                         Jeep             23
                         Porsche          21
                         Telsa            21

[96 rows x 1 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading