Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pd.groupby in two columns sum() function does not work

I am trying to do the next example. I want to get the total number of survivors (survived) and the total amount paid per class (fared) using the Titanic dataset.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/bvalgard/data/main/titanic.csv')

df.groupby(['pclass'])[['survived', 'fare']].sum()

When I run this code, I only get the total number of survivors, but not the total amount paid. However, if I use other functions such as .min(), .max() etc it works.

What is the problem with the .sum() function then?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

This is because you don’t have numerical values but objects (df['fare'].dtype gives dtype('O'), df['survived'].dtype gives dtype('int64')), and groupby.sum silently drops invalid columns. You must convert to_numeric:

(df.assign(fare=pd.to_numeric(df['fare'], errors='coerce'))
   .groupby(['pclass'])[['survived', 'fare']].sum()
)

Output:

        survived        fare
pclass                      
1            200  28265.4043
2            119   5866.6374
3            181   9418.4452

Why does it fail?

groupby.sum only works with numeric data by default (numeric_only=True), while groupby.max has a default of False as it’s more flexible to handle a wider range of data types (you can for instance compare strings, max(['A', 'B']) give 'B').

Can’t I use numeric_only=False?

Yes, you can, but this would concatenate the strings rather than adding numerical values, which is not what you want:

df.groupby(['pclass'])[['survived', 'fare']].sum(numeric_only=False)

        survived                                               fare
pclass                                                             
1            200  211.3375151.55151.55151.55151.5526.5577.958305...
2            119  24241311.510.526261311.510.51310.512.525262626...
3            181  7.5520.2520.2520.257.657.657.9257.22927.258.05...
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading