Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create count matrix based on two columns and respective range values pandas

I have a dataframe which looks like below,

name,value,id
meanerror,0.55,aa
meanamount,120,aa
meanerror,0.45,bb
meanamount,150,bb
meanerror,0.88,cc
meanamount,110,cc
meanerror,0.1,dd
meanamount,50,dd

I would like to create a matrix from this dataframe like below.

,         meanamount,    total_y
meanerror,0-100,100-200
0.0-0.5,    1,    1,      2   
0.5-1,      0,    2,      2
total_x,    1,    3

what I actually need is, in the matrix, each cell should contain count of ids which has value(from value column) in the range on both x and y axis of the matrix. i.e for example the first cell should contain count of ids with meanamount in range 0-100 and meanerror in range 0.0-5.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried pandas pivot table and crosstab but unsure how to achieve this. Can anyone help?

>Solution :

  1. Create a pivot table:

    pt = df.pivot(index='id', columns='name', values='value')
    
    # name  meanamount  meanerror
    # id                         
    # aa         120.0       0.55
    # bb         150.0       0.45
    # cc         110.0       0.88
    # dd          50.0       0.10
    
  2. Cut the amounts and errors into bins:

    pt['meanamount'] = pd.cut(pt['meanamount'], bins=range(0, 300, 100))
    pt['meanerror'] = pd.cut(pt['meanerror'], bins=np.arange(0, 1.5, 0.5))
    
    # name  meanamount   meanerror
    # id                          
    # aa    (100, 200]  (0.5, 1.0]
    # bb    (100, 200]  (0.0, 0.5]
    # cc    (100, 200]  (0.5, 1.0]
    # dd      (0, 100]  (0.0, 0.5]
    
  3. Create a crosstab of error x amount:

    pd.crosstab(pt['meanerror'], pt['meanamount'], margins=True)
    
    # meanamount  (0, 100]  (100, 200]  All
    # meanerror                            
    # (0.0, 0.5]         1           1    2
    # (0.5, 1.0]         0           2    2
    # All                1           3    4
    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading