Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find positive and negative bin limits based on multiple other columns

I have a dataframe like as shown below

   ID   raw_val  var_name   constant   s_value   
   1    388       Qty        0.36       -0.032   
   2    120       Qty        0.36       -0.007
   3    34        Qty        0.36       0.16
   4    45        Qty        0.36       0.31
   1    110       F1         0.36       -0.232   
   2    1000      F1         0.36       -0.17
   3    318       F1         0.36       0.26
   4    419       F1         0.36       0.31

My objective is to

a) Find the upper and lower limits (of raw_val) for each value of var_name for s_value >=0

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) Find the upper and lower limits (of raw_val) for each value of var_name for s_value <0

I tried the below

df['sign'] = np.where[df['s_value']<0, 'neg', 'pos']
s = df.groupby(['var_name','sign'])['raw_val'].series
df['buckets'] = pd.IntervalIndex.from_arrays(s)

Please note that my real data is big data and has more than 200 unique values for var_name column. The distribution of positive and negative values (s_value) may be uneven for each value of the var_name columns. In sample df, I have shown even distribution of pos and neg values but it may not be the case in real life.

I expect my output to be like as below

var_name sign  low_limit   upp_limit
 Qty     neg     120          388
 F1      neg     110          1000
 Qty     pos     34            45
 Qty     pos     318          419

>Solution :

I think numpy.where with aggregate minimal and maximal values is way:

df['sign'] = np.where(df['s_value']<0, 'neg', 'pos')
df1 = (df.groupby(['var_name','sign'], sort=False, as_index=False)
         .agg(low_limit=('raw_val','min'), upp_limit=('raw_val','max')))
print (df1)
  var_name sign  low_limit  upp_limit
0      Qty  neg        120        388
1      Qty  pos         34         45
2       F1  neg        110       1000
3       F1  pos        318        419
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading