Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – trouble pivoting, grouping, and summing dataframe columns

I have this code. I need to group by CustomerName and then sum the filegroups.

def consolidated_df():
    df = breakdown_df()
    df.pivot_table(index='CustomerName', columns='FileGroup', aggfunc="sum")
    return df

breakdown_df() looks like

ID    CustomerName    FileGroup    Size        Size(Bytes)
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648

I am trying to roll it up into

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID    CustomerName    DatabaseSize    DatabaseSizeBytes    BackupSize     BackupSizeBytes     SiteSize            SiteSizeByte         TotalSize
1     CustomerA       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]
2     CustomerB       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]

I’m not so worried about actually summing Size because I can convert the bites. I just can’t seem to get my pivot to work and unsure where I am going wrong.

>Solution :

If you don’t explicitly set values, it’ll try to use all remaining columns…

out = df.pivot_table(index='CustomerName', columns='FileGroup', values='Size(Bytes)', aggfunc='sum')
print(out)

Output:

FileGroup        Backup   Database       Site
CustomerName                                 
CustomerA     209335296  209335296  209335296
CustomerB     209335296  209335296  209335296

You can also have margins if desired:

df.pivot_table(index='CustomerName', 
               columns='FileGroup', 
               values='Size(Bytes)', 
               aggfunc='sum', 
               margins=True, 
               margins_name='TotalSize').drop('TotalSize')

# Output:

FileGroup        Backup   Database       Site  TotalSize
CustomerName                                            
CustomerA     209335296  209335296  209335296  628005888
CustomerB     209335296  209335296  209335296  628005888
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading