Pandas groupby columns and multiply two other columns in the aggregate function

I have a hopefully easy problem for some help stack helpers! I have a dataframe:

df = pd.DataFrame({'Quantity': [2, 3, 4, 1, 2, 1, 4, 5],
                   'User': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'Price': [5, 3, 2, 6, 2, 3, 4, 5],
                   'Shop': ['X', 'X', 'X', 'Y', 'Z', 'Z', 'X', 'Y'],
                   'Day': ['M', 'T', 'W', 'W', 'M', 'T', 'M', 'W']
                   })


|   QuantityUser    Price   Shop    Day
0   2   A   5   X   M
1   3   A   3   X   T
2   4   B   2   X   W
3   1   B   6   Y   W
4   2   B   2   Z   M
5   1   C   3   Z   T
6   4   C   4   X   M
7   5   C   5   Y   W

My trouble comes when I try and aggregate it by shop and day. I’m hoping for the users in a shop by day and the average spent in that shop on that day. So in SQL it would be: AVG(Quantity*Price)

I have the first part:

df.groupby(by=['Shop','Day']).agg({'User': 'count'})

But my only solution to the other aggregation is first create a column and then aggregate it.

df['Spend'] = df['Price'] * df['Quantity']
df.groupby(by=['Shop','Day']).agg({'User': 'count' ,'Spend' :'mean' })

Is there a better method I am missing? Ideally I want the aggregation to happen alongside the Count aggregate without the need for a new column created.

>Solution :

You can achieve that without creating new column like so:

result = df.groupby(by=['Shop', 'Day']).agg(
    User=('User', 'count'),
    Spend=('Price', lambda x: (x * df.loc[x.index, 'Quantity']).mean())
)

Output is the same as yours:

                  User    Spend
Shop Day                       
X    M             2       13.0
     T             1        9.0
     W             1        8.0
Y    W             2       15.5
Z    M             1        4.0
     T             1        3.0

Leave a Reply