I want to calculate the total sum of all parameter in a dataframe except one named Production-avg in the below python code:
code:
import pandas as pd
# Example DataFrame
df = pd.DataFrame({'category': ['Production', 'Production', 'Production', 'Production', 'Production'], 'parameter': ['hello', 'Production-avg', 'Production-etf', 'ktf', 'kfc'], 'actuals': [10, 30, 50, 70, 90], 'bp': [15, 35, 55, 75, 95]})
df.loc['total'] = df.select_dtypes(include=['int64']).sum()
print(df)
current output:
category parameter actuals bp
0 Production hello 10.0 15.0
1 Production Production-avg 30.0 35.0
2 Production Production-etf 50.0 55.0
3 Production ktf 70.0 75.0
4 Production kfc 90.0 95.0
total NaN NaN 250.0 275.0
Expected output:
category parameter actuals bp
0 Production hello 10.0 15.0
1 Production Production-avg 30.0 35.0
2 Production Production-etf 50.0 55.0
3 Production ktf 70.0 75.0
4 Production kfc 90.0 95.0
total NaN NaN 220.0 240.0
Can anyone suggest a solution to get the expected output?
>Solution :
You can use boolean indexing to filter out rows where parameter column value is Production-avg
df.loc['total'] = (df.select_dtypes('number').drop('total')
[df['parameter'] != 'Production-avg'].sum())
# or
df.loc['total'] = (df.query('parameter != "Production-avg"')
.select_dtypes('number').drop('total').sum())
As mozway suggests, some modification in your original code:
- Use
.select_dtypes('number')to select numeric columns to includeintandfloattype. - Use
.drop('total')to drop the existingtotalrow to avoid summing that row to new result.
print(df)
category parameter actuals bp
0 Production hello 10.0 15.0
1 Production Production-avg 30.0 35.0
2 Production Production-etf 50.0 55.0
3 Production ktf 70.0 75.0
4 Production kfc 90.0 95.0
total NaN NaN 220.0 240.0