I have a dataframe:
id value
a 0:3,1:0,2:0,3:4
a 0:0,1:0,2:2,3:0
a 0:0,1:5,2:4,3:0
How to get average values of keys in column value?
For example for 0:3,1:0,2:0,3:4 it must be (0+0+0+3+3+3+3)/7 = 1.71.
For 0:0,1:0,2:2,3:0 it must be 2+2/2=2.
For 0:0,1:5,2:4,3:0 it must be (1+1+1+1+1+2+2+2+2)/9 = 1.44.
So desired result is:
id value
a 1.71
a 1.66
a 1.44
How to do that? Its not a dictionary values, so I dont really understand how to operate with keys and values here
>Solution :
You could use:
g = (df.join(df['value'].str.split(',')
.explode()
.str.split(':', expand=True)
.astype(int)
)
.assign(value=lambda d: d[0]*d[1])
.set_index('id', append=True)
.groupby(level=[0,1])
)
g['value'].sum()/g[1].sum()
output:
id
0 a 1.7143
1 a 2.0000
2 a 1.4444
how it works:
1- splitting the values and create new columns
(df.join(df['value'].str.split(',')
.explode()
.str.split(':', expand=True)
.astype(int)
)
.assign(value=lambda d: d[0]*d[1])
.set_index('id', append=True)
)
value 0 1
id
0 a 0 0 3
a 0 1 0
a 0 2 0
a 12 3 4
1 a 0 0 0
a 0 1 0
a 4 2 2
a 0 3 0
2 a 0 0 0
a 5 1 5
a 8 2 4
a 0 3 0
2- computing sum of values and sum of weights to get the mean
>>> g['value'].sum()
id
0 a 12
1 a 4
2 a 13
>>> g[1].sum()
id
0 a 7
1 a 2
2 a 9
>>> g['value'].sum()/g[1].sum()
id
0 a 1.7143
1 a 2.0000
2 a 1.4444