Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Weighted average of a dictionary – Pandas

I have the following column in a data-frame (it is an example):

First row is: '{"100":10,"50":3,"-90":2}'.

Second row is: '{"100":70,"50":3,"-90":2,"-40":3}'.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to calculate a weighted average where the dictionary’s keys are the values and the dictionary’s values are the weights of the weighted average.

The final value of the first row should be: 64.666, which is (100*10+50*3-90*2)/(10+3+2); and the of the second row should be: 87.82.

For each dictionary there might be hundreds of keys/values and the column might have thousands of rows. How can I code it efficiently? Preferably vectorially.

>Solution :

You can use json.loads and pandas.Series.apply.

import json

def cal_avg(dct):
    dct = json.loads(dct)
    return sum(int(k)*v for k,v in dct.items()) / sum(dct[k] for k in dct)

df['dct'].apply(cal_avg)

Output:

0    64.666667
1    87.820513
Name: dct, dtype: float64

Input DataFrame:

import pandas as pd
df = pd.DataFrame({
    'dct': [
        '{"100":10,"50":3,"-90":2}',
        '{"100":70,"50":3,"-90":2,"-40":3}'
    ]
})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading