I have a Pandas DataFrame for which I would like to calculate some weighted means, with respect to a group given by a column ‘Class’.
import pandas as pd
import numpy as np
df_test = pd.DataFrame.from_dict({
"Class":["A","A","A","B","B","B"],
"X":[0, 1, 2, 3, 4, 5],
"Y":[0, 1, 2, 3, 4, 5],
"Z":[0, 1, 2, 3, 4, 5],
"W":[1, 1, 1, 2, 2, 2],
})
def GetWMean(group):
Q = group[["X", "Y", "Z"]]
W = group["W"]
Wms = W.dot(Q)/W.sum()
return Wms
WMs = df_test.groupby("Class").apply(lambda x: GetWMean(x))
I would like it so that, like in a transform, I get three new columns with the value I calculated, repeated for each row belonging to the group. i.e. each row used in the apply function has the weighted mean I calculated for the group, repeated for all the rows.
How can I achieve this?
>Solution :
IIUC, just join them afterwards:
df_test.join(WMs, on=df_test.Class, rsuffix="_WMean")
Output:
Class X Y Z W X_WMean Y_WMean Z_WMean
0 A 0 0 0 1 1.0 1.0 1.0
1 A 1 1 1 1 1.0 1.0 1.0
2 A 2 2 2 1 1.0 1.0 1.0
3 B 3 3 3 2 4.0 4.0 4.0
4 B 4 4 4 2 4.0 4.0 4.0
5 B 5 5 5 2 4.0 4.0 4.0