Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return label of the column which has the maximum value if it is higher than a certain number

I have a pandas dataframe with 18 columns, the columns from 2:17 hold numerical values with labels 2-17. I want to add a new column (‘category’) to the dataframe to include the label of the column which holds the maximum value if that value is higher than a certain number (0.5). I used the below code after utilizing a related question on #Pandas (python): max in columns define new value in new column

  test_df['Category'] = test_df.values[:,2:18].argmax(1) if (test_df.values[:,2:18].max(1) >=0.5) else 'none'

when I run the above code, I get the error message "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", any advice is appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use mask to hide row where no values are greater than 0.5 then extract the column name with idxmax.

test_df['Category'] = (
    test_df.iloc[:, 2:18].mask(~test_df.iloc[:, 2:18].ge(0.5).any(1), other=pd.NA)
           .idxmax(1).fillna('none')
)

Output:

>>> test_df
       A     B      col0      col1      col2      col3      col4      col5      col6      col7      col8      col9     col10     col11     col12     col13     col14     col15 Category
0  alpha  beta  0.004926  0.262662  0.059676  0.026302  0.360741  0.256310  0.472451  0.340764  0.472086  0.379545  0.437554  0.435562  0.438726  0.503708  0.193708  0.260441    col13
1  alpha  beta  0.178689  0.326015  0.514489  0.050754  0.391687  0.153947  0.157198  0.396039  0.009823  0.275651  0.454966  0.204654  0.111680  0.250095  0.297196  0.183910     col2
2  alpha  beta  0.513636  0.019905  0.418037  0.188359  0.393665  0.481321  0.196138  0.507833  0.042835  0.022343  0.156208  0.191423  0.258029  0.351852  0.354429  0.301106     col0
3  alpha  beta  0.042417  0.472806  0.020204  0.411681  0.019293  0.140623  0.108013  0.136260  0.490850  0.004276  0.212354  0.470580  0.107478  0.011461  0.366930  0.100539     none
4  alpha  beta  0.287596  0.317487  0.520418  0.048656  0.033835  0.519975  0.237952  0.449427  0.211287  0.204319  0.465478  0.343144  0.408842  0.050700  0.508506  0.169613     col2
5  alpha  beta  0.154299  0.369279  0.188567  0.083248  0.004332  0.436839  0.329708  0.458652  0.329425  0.068757  0.238121  0.102302  0.497296  0.219990  0.041560  0.455992     none
6  alpha  beta  0.361315  0.010401  0.039190  0.013335  0.236268  0.268206  0.407632  0.149902  0.041239  0.481870  0.338411  0.430095  0.335075  0.405940  0.108008  0.393766     none
7  alpha  beta  0.087221  0.146438  0.202438  0.132151  0.126972  0.442878  0.011017  0.471010  0.229542  0.139989  0.476444  0.294640  0.057385  0.369218  0.433006  0.388687     none
8  alpha  beta  0.519944  0.083779  0.520042  0.382255  0.315839  0.083350  0.038597  0.114521  0.160149  0.424750  0.192139  0.457243  0.054164  0.252922  0.133647  0.240518     col2
9  alpha  beta  0.246072  0.172740  0.002171  0.046882  0.132817  0.365952  0.252149  0.516113  0.050739  0.054263  0.331012  0.493549  0.433013  0.484111  0.118902  0.475689     col7

Setup MRE

np.random.seed(2022)
test_df = pd.DataFrame(np.random.random((10, 16))) / 1.9
test_df.columns = 'col' + test_df.columns.astype(str)
tmp_df = pd.DataFrame({'A': 'alpha', 'B': 'beta'}, index=test_df.index)
test_df = pd.concat([tmp_df, test_df], axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading