I am trying to do the next example. I want to get the total number of survivors (survived) and the total amount paid per class (fared) using the Titanic dataset.
import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/bvalgard/data/main/titanic.csv') df.groupby(['pclass'])[['survived', 'fare']].sum()
When I run this code, I only get the total number of survivors, but not the total amount paid. However, if I use other functions such as
.max() etc it works.
What is the problem with the
.sum() function then?
This is because you don’t have numerical values but objects (
groupby.sum silently drops invalid columns. You must convert
(df.assign(fare=pd.to_numeric(df['fare'], errors='coerce')) .groupby(['pclass'])[['survived', 'fare']].sum() )
survived fare pclass 1 200 28265.4043 2 119 5866.6374 3 181 9418.4452
Why does it fail?
groupby.sum only works with numeric data by default (
groupby.max has a default of
False as it’s more flexible to handle a wider range of data types (you can for instance compare strings,
max(['A', 'B']) give
Can’t I use
Yes, you can, but this would concatenate the strings rather than adding numerical values, which is not what you want:
df.groupby(['pclass'])[['survived', 'fare']].sum(numeric_only=False) survived fare pclass 1 200 211.3375151.55151.55151.55151.5526.5577.958305... 2 119 24241311.510.526261311.510.51310.512.525262626... 3 181 7.5520.2520.2520.257.657.657.9257.22927.258.05...