Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Generating a separate column that stores weighted average per group

It must not be that hard but I can’t cope with this problem.

Imagine I have a long format dataframe with some data and want to calculate a weighted average of score per person and weighted by a manager and keep it as a separate variable – ‘w_mean_m’.

df['w_mean_m'] = df.groupby('person')['score'].transform(lambda x: np.average(x['score'], weights=x['manager_weight']))

throws an error and I have no idea how to fix it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Because GroupBy.transform working with each column separately is not possible select multiple columns, so is used GroupBy.apply with Series.map for new column:

s = (df.groupby('contact')
       .apply(lambda x: np.average(x['score'], weights=x['manager_weight'])))
df['w_mean_m'] = df['contact'].map(s)

One hack is possible with selected values by unique index for weights:

df = df.reset_index(drop=True)

f = lambda x: np.average(x, weights=df.loc[x.index, "manager_weight"])
df['w_mean_m1'] = df.groupby('contact')['score'].transform(f)


print (df)
    manager_weight  score contact  w_mean_m1
0              1.0      1       a   1.282609
1              1.1      1       a   1.282609
2              1.2      1       a   1.282609
3              1.3      2       a   1.282609
4              1.4      2       b   2.355556
5              1.5      2       b   2.355556
6              1.6      3       b   2.355556
7              1.7      3       c   3.770270
8              1.8      4       c   3.770270
9              1.9      4       c   3.770270
10             2.0      4       c   3.770270

Setup:

df = pd.DataFrame(
    {
        "manager_weight": [1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0],
        "score": [1,1,1,2,2,2,3,3,4,4,4],
        "contact": ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c']
    })
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading