Is there a way to exclude rows that take certain values when aggregating?
For example:
ID | Company | Cost
1 | Us | 2
1 | Them | 1
1 | Them | 1
2 | Us | 1
2 | Them | 2
2 | Them | 1
I would like to do a groupby and sum but ignoring whenever a row is Company="us".
The result should be something like this:
ID | Sum of cost
1 | 2
2 | 3
I solved it by doing this, but I want to know if there’s a smarter solution:
df_agg = df[df['Company']!="Us"][['ID','Cost']].groupby(['ID']).sum()
>Solution :
Use:
print (df)
ID Company Cost
0 1 Us 2
1 1 Them 1
2 1 Them 1
3 2 Us 1
4 2 Them 2
5 2 Them 1
6 3 Us 1 <- added new row for see difference
If need filter first and not matched groups (if exist) are not important use:
df1 = df[df.Company!="Us"].groupby('ID', as_index=False).Cost.sum()
print (df1)
ID Cost
0 1 2
1 2 3
df1 = df.query('Company!="Us"').groupby('ID', as_index=False).Cost.sum()
print (df1)
ID Cost
0 1 2
1 2 3
If need all groups ID with Cost=0 for Us first set Cost to 0 and then aggregate:
df2 = (df.assign(Cost = df.Cost.where(df.Company!="Us", 0))
.groupby('ID', as_index=False).Cost
.sum())
print (df2)
ID Cost
0 1 2
1 2 3
2 3 0