Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pivoting a tall DataFrame to wide format with a MultiIndex for the columns

I have a long-format table that shows which suppliers are assigned to which service for a set of locations. For each location and service, there are an arbitrary number of assignment levels (eg if the 1st supplier can’t provide the service, work will be escalated to the 2nd supplier).

I’m trying to pivot this table to wide format, and I know that I’m definitely doing something wrong. In the expected wide-format table, after the location, there should be a double header where level 0 is the service name, and level 1 each assignment number. So when exported to Excel, each service name should span the arbitrary # of assignment columns.

The actual output is not the expected result. I think that I’m either pivoting wrong, or pd.pivot is not the right tool for this. The fact that I need to manually sort and adjust the MultiIndex after the pivot is telling.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How can I adjust my code to get the expected output?

Sample Code

import pandas as pd
import numpy as np


sample_data = [
 {'Location': 'required',
  'ServiceSpecialty': 'required',
  'Assignment 1': 'optional',
  'Assignment 2': 'optional',
  'Assignment 3': 'optional',
  'Assignment 4': 'optional',
  'Assignment 5': 'optional'},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '123 Main Street',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Appliances',
  'Assignment 1': 'John Smith',
  'Assignment 2': np.nan,
  'Assignment 3': np.nan,
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Carpentry/Handyman Svcs',
  'Assignment 1': 'ACME Supplier A',
  'Assignment 2': 'Mom & Pop Shop',
  'Assignment 3': 'Amy Smith',
  'Assignment 4': np.nan,
  'Assignment 5': np.nan},
 {'Location': '456 Broadway Ave',
  'ServiceSpecialty': 'Doors',
  'Assignment 1': 'Abugida',
  'Assignment 2': 'ACME Industries',
  'Assignment 3': 'Mom & Pop Shop',
  'Assignment 4': 'Amy Smith',
  'Assignment 5': 'John Smith'}
]

df = pd.DataFrame.from_dict(sample_data)

# Remove the unneeded secondary header (required/optional), and pivot
df = df.drop(index=df.index[0], axis=0)
df = df.pivot(index='Location', columns='ServiceSpecialty').reset_index()

# Remove any fully blank columns (eg "ATM" only has 1 assignment, but "Carpentry/Handyman Svcs" has 3).
# We only want to retain the populated "Assignment" columns
df = df.dropna(axis=1, how='all')

# Clean up the MultiIndex header (remove the names, and put the ServiceSpecialty above the Assignment
# columns. This will cause the ServiceSpecialty to span all the Assignment columns when exported to Excel)
df.index.name = None
df.columns.names = (None, None)
df.columns = df.columns.swaplevel(0,1)

# Sort the MultiIndex columns by the ServiceSpecialty, not the Assignment
cols_location = [('', 'Location')]
cols_assignments = [c for c in df.columns if c[0] != '']
cols_assignments.sort(key=lambda c: c[0])
cols_updated = cols_location + cols_assignments
df.columns = pd.MultiIndex.from_tuples(cols_updated)

# Write the output to file
with pd.ExcelWriter('current_output.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer)

Sample Input

enter image description here

Current Output

enter image description here

Expected Output

enter image description here

>Solution :

IIUC, you can use just .swaplevel():

df = df.drop(index=df.index[0], axis=0)
df = (
    df.pivot(index="Location", columns="ServiceSpecialty")
    .reset_index()
    .swaplevel(axis=1)
    .sort_index(axis=1)
    .dropna(axis=1, how="all")
    .rename_axis(index=None, columns=(None, None))
)
print(df)

Prints:

                      Appliances Carpentry/Handyman Svcs                                     Doors                                                           
           Location Assignment 1            Assignment 1    Assignment 2 Assignment 3 Assignment 1     Assignment 2    Assignment 3 Assignment 4 Assignment 5
0   123 Main Street   John Smith         ACME Supplier A  Mom & Pop Shop    Amy Smith      Abugida  ACME Industries  Mom & Pop Shop    Amy Smith   John Smith
1  456 Broadway Ave   John Smith         ACME Supplier A  Mom & Pop Shop    Amy Smith      Abugida  ACME Industries  Mom & Pop Shop    Amy Smith   John Smith
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading