Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby everything but one column

I have a pandas dataframe that initially has these columns below and grows as the program continues.

‘Branch_ID’, ‘Region’, ‘ActualRegion’, ‘Lease_Id’, ‘Year’, ‘Make’,
‘Model’, ‘VIN’, ‘Plate_Number’, ‘Full_Name’, ‘Employee Number’, ‘Job
Title’, ‘Device Serial Number’, ‘Blackout since’, ‘Covered’

Currently I have multiple groupby Statements that look similar to this. This example sums everything in the last column called Miles Driven

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

report = report.groupby(['Branch_ID', 'Region', 'ActualRegion', 'Lease_Id', 
                         'Year', 'Make',  'Model',  'VIN',  
                        'Plate_Number',  'Full_Name', 'Employee Number', 'Job Title',
                      'Device Serial Number', 'Blackout since',  'Covered']).sum().reset_index()

I have to do a similar process multiple times and each time I do it a new column is added.. I am trying to create a reusable function to simplify and get rid of redundant code.

What I have tried; in my head the code above should be able to look like this.. I am summing a column called Miles Driven if there are duplicates in the other columns

columns_to_group = report.columns.difference(['Miles Driven'])

report = report.groupby(columns_to_group).sum().reset_index()

This is much more elegant and will help me create a function that could shorten my code signficantly but I have tried many ways to get something similar to work and can’t.

The error above that I get is

raise ValueError("Grouper and axis must be same length")

If I print out columns_to_group it is identical to what I am inserting in the groupby statement above

Thank you so much for any help!

>Solution :

If I print out columns_to_group it is identical to what I am inserting
in the groupby statement above

Whilst it might looks like list, it is not, convert it to list and it should work. Simple example

import pandas as pd
df = pd.DataFrame({'X':[0,0,1,1],'Y':[1,1,0,0],'Z':[1,10,100,1000]})
group_cols = df.columns.difference(['Z'])
df_sum = df.groupby(list(group_cols)).sum().reset_index()
print(df_sum)

output

   X  Y     Z
0  0  1    11
1  1  0  1100

Note: I used own data sample for brevity sake.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading