Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert value counts of multiple columns to pandas dataframe

I have a dataset in this form:

Name  Batch  DXYR  Emp  Lateral  GDX  MMT  CN
Joe    2       0          2       2    2    0  
Alan   0       1          1       2    0    0
Josh   1       1          2       1    1    2 
Max    0       1          0       0    0    2

These columns can have only three distinct values ie. 0, 1 and 2..
So, I need percent of value counts for each column in pandas dataframe..

I have simply make a loop like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for i in df.columns:
  (df[i].value_counts()/df[i].count())*100

I am getting the output like:

0    90.608831
1     0.391169
2    9.6787899
Name: Batch, dtype: float64

0    95.545455
1     2.235422
2    2.6243553
Name: MX, dtype: float64

and so on...

These outputs are correct but I need it in pandas dataframe like this:

                 Batch  DXYR  Emp  Lateral  GDX     MMT    CN
Count_0_percent  98.32  52.5   22   54.5     44.2   53.4  76.01  
Count_1_percent  0.44   34.5   43   43.5     44.5   46.5  22.44
Count_2_percent  1.3    64.3   44   2.87     12.6   1.88  2.567

Can someone please suggest me how to get it

>Solution :

You can melt the data, then use pd.crosstab:

melt = df.melt('Name')
pd.crosstab(melt['value'], melt['variable'], normalize='columns')

Or a bit faster (yet more verbose) with melt and groupby().value_counts():

(df.melt('Name')
   .groupby('variable')['value'].value_counts(normalize=True)
   .unstack('variable', fill_value=0)
)

Output:

variable  Batch   CN  DXYR  Emp Lateral   GDX   MMT
value                                              
0          0.50  0.5  0.25         0.25  0.25  0.50
1          0.25  0.0  0.75         0.25  0.25  0.25
2          0.25  0.5  0.00         0.50  0.50  0.25

Update: apply also works:

df.drop(columns=['Name']).apply(pd.Series.value_counts, normalize=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading