Is there a way to know the portion of a row in group in groupby?

Advertisements

I am working on this table:

plaza station name ID g1 g2 g3 location g_avg
45 1 aaa 45_1_aaa 4 5 6 sun 5.00
45 2 aaa 45_2_aaa 1 5 2 sun 2.67
46 1 bbb 46_1_bbb 8 9 3 sun 6.67
47 1 ccc 47_1_ccc 4 5 0 sun 3.00
47 2 ccc 47_2_ccc 3 4 7 sun 4.67
45 3 aaa 45_3_aaa 3 4 0 sun 2.33
34 1 ddd 34_1_ddd 19 29 28 moon 25.33
47 3 eee 47_3_eee 1 2 2 moon 1.67
48 1 fff 48_1_fff 1 2 1 moon 1.33
49 1 ggg 49_1_ggg 2 1 1 moon 1.33
49 2 ggg 49_2_ggg 1 1 1 moon 1.00

I am running a groupby on the location and ID columns and I want to get, for each row, the g_avg divided by the sum of g_average for rows in that location.

The code im using is this

data.groupby(['location', 'ID']).apply(lambda x: len(x['g_avg']) /x['g_avg'].sum() * 100)

and this is the result I get:

What I was hoping to achieve is this:

Where am I going wrong?

>Solution :

You can use groupby.transform to find the total g_avg for each location; then divide g_avg values by these sums:

df['ratio'] = df['g_avg'] / df.groupby('location')['g_avg'].transform('sum') * 100
out = df[['location','ID','ratio']]

Output:

   location        ID      ratio
0       sun  45_1_aaa  20.542317
1       sun  45_2_aaa  10.969597
2       sun  46_1_bbb  27.403451
3       sun  47_1_ccc  12.325390
4       sun  47_2_ccc  19.186524
5       sun  45_3_aaa   9.572720
6      moon  34_1_ddd  82.615786
7      moon  47_3_eee   5.446836
8      moon  48_1_fff   4.337900
9      moon  49_1_ggg   4.337900
10     moon  49_2_ggg   3.261579

Leave a ReplyCancel reply