How to get average values from specific value type?

November 12, 2021

I have a dataframe:

id            value
a       0:3,1:0,2:0,3:4
a       0:0,1:0,2:2,3:0
a       0:0,1:5,2:4,3:0

How to get average values of keys in column value?

For example for 0:3,1:0,2:0,3:4 it must be (0+0+0+3+3+3+3)/7 = 1.71.

For 0:0,1:0,2:2,3:0 it must be 2+2/2=2.

For 0:0,1:5,2:4,3:0 it must be (1+1+1+1+1+2+2+2+2)/9 = 1.44.

So desired result is:

id            value
a              1.71
a              1.66
a              1.44

How to do that? Its not a dictionary values, so I dont really understand how to operate with keys and values here

>Solution :

You could use:

g = (df.join(df['value'].str.split(',')
                        .explode()
                        .str.split(':', expand=True)
             .astype(int)
            )
       .assign(value=lambda d: d[0]*d[1])
       .set_index('id', append=True)
       .groupby(level=[0,1])
      )
g['value'].sum()/g[1].sum()

output:

   id
0  a       1.7143
1  a       2.0000
2  a       1.4444

how it works:

1- splitting the values and create new columns

(df.join(df['value'].str.split(',')
                        .explode()
                        .str.split(':', expand=True)
             .astype(int)
            )
   .assign(value=lambda d: d[0]*d[1])
   .set_index('id', append=True)
)

      value  0  1
  id             
0 a       0  0  3
  a       0  1  0
  a       0  2  0
  a      12  3  4
1 a       0  0  0
  a       0  1  0
  a       4  2  2
  a       0  3  0
2 a       0  0  0
  a       5  1  5
  a       8  2  4
  a       0  3  0

2- computing sum of values and sum of weights to get the mean

>>> g['value'].sum()
   id
0  a     12
1  a      4
2  a     13

>>> g[1].sum()
   id
0  a     7
1  a     2
2  a     9 

>>> g['value'].sum()/g[1].sum()
   id
0  a       1.7143
1  a       2.0000
2  a       1.4444