I have the following column in a data-frame (it is an example):
First row is: '{"100":10,"50":3,"-90":2}'.
Second row is: '{"100":70,"50":3,"-90":2,"-40":3}'.
I want to calculate a weighted average where the dictionary’s keys are the values and the dictionary’s values are the weights of the weighted average.
The final value of the first row should be: 64.666, which is (100*10+50*3-90*2)/(10+3+2); and the of the second row should be: 87.82.
For each dictionary there might be hundreds of keys/values and the column might have thousands of rows. How can I code it efficiently? Preferably vectorially.
>Solution :
You can use json.loads and pandas.Series.apply.
import json
def cal_avg(dct):
dct = json.loads(dct)
return sum(int(k)*v for k,v in dct.items()) / sum(dct[k] for k in dct)
df['dct'].apply(cal_avg)
Output:
0 64.666667
1 87.820513
Name: dct, dtype: float64
Input DataFrame:
import pandas as pd
df = pd.DataFrame({
'dct': [
'{"100":10,"50":3,"-90":2}',
'{"100":70,"50":3,"-90":2,"-40":3}'
]
})