Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get a group by with aggregations considering the value of the columns of a dataframe

i have a pandas dataframe like this:

id    gender  column_1  column_2  column_3  column_n
10      male     a          b         a           b
10      female   b          c         b           c
10      male     c          c         a           a
10      male     b          a         a           b

I want to get this as output:

id    column_name     male_%_a  male_%_b   male_%_c   female_%_a  female_%_b   female_%_c
10    column_1           33.3       33.3       33.3       0            100        0
10    column_2           33.3       33.3       33.3       0             0        100
10    column_3           100        0           0         0            100        0
10    column_n           33.3      66.7         0         0             0        100

The dataframe can have any number of column_1 to column_n and will always have a column idand gender

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to groupby like this: df.gropby(‘gender’) but i am having a hard time to get the calculations to get the % of each label a,b,c

>Solution :

You can melt, crosstab and unstack:

df2 = df.melt(['id', 'gender'], var_name='columns_name')

out = (
 pd.crosstab([df2['id'], df2['gender'], df2['columns_name']], df2['value'])
   .pipe(lambda d: d.div(d.sum(axis=1), axis=0).mul(100).round(2))
   .unstack('gender')
)

out.columns = out.columns.map(lambda x: f'{x[1]}_%_{x[0]}')

out = out.reset_index()

Output:

   id columns_name  female_%_a  male_%_a  female_%_b  male_%_b  female_%_c  male_%_c
0  10     column_1         0.0     33.33       100.0     33.33         0.0     33.33
1  10     column_2         0.0     33.33         0.0     33.33       100.0     33.33
2  10     column_3         0.0    100.00       100.0      0.00         0.0      0.00
3  10     column_n         0.0     33.33         0.0     66.67       100.0      0.00
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading