Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas.DataFrame.idxmin – how to handle equal values

I have the following dataframe:

frame=pd.DataFrame(index=["Index_1","Index_2","Index_3","Index_4","Index_5","Index_6","Index_7"],
                   columns=["Header_1","Header_2","Rank_1","Rank_2"], data=[[4.06,4.34,1,1],[3.30,2.35,2,3],
                    [3.26,1.02,3,7],[3.14,2.40,4,2],[2.61,1.81,5,5],[2.52,2.33,6,4],[2.36,1.62,7,6]])

which results in:

         Header_1   Header_2    Rank_1  Rank_2
Index_1    4.06      4.34          1      1
Index_2    3.30      2.35          2      3
Index_3    3.26      1.02          3      7
Index_4    3.14      2.40          4      2
Index_5    2.61      1.81          5      5
Index_6    2.52      2.33          6      4
Index_7    2.36      1.62          7      6

The Rank columns (_1,_2) show the ranking of the values (say returns) in the header columns (_1,_2) when the latter are sorted in descending order. For example at Index_2 , Header 1 – we have the second highest value of 3.3 within Header_1 column , therefore the rank of 2 at column Rank_1.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For each Index (_1,_2,_3,_4….) I want to map either Rank_1 or Rank_2 depending on where the minimum value is; In essence this achieved via pandas.DataFrame.idxmin.
If I use: frame.iloc[:,[2,3]].idxmin(axis=1) I arrive at:

Index_1    Rank_1
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2

This is more or less the desired output; yet when there are equal ranks in Rank_1 and Rank_2 columns, such as at Index_1 or Index_5 I want the algorithm to check also the two Header columns and make final decision based on the highest value from Header_1 or Header_2. So at Index_1 we see that the ranks are equal (1); however Header_2 contains the higher value (4.34) compared to Header_1 (4.06). Therefore I would expect the final output to be as follows:

Index_1    Rank_2
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2

Note: the difference at the last two outputs is just at Index_1.

>Solution :

Add or subtract an epsilon value to Rank_1 according to Header_1 < Header_2 or not:

eps = 0.000001

out = frame.assign(Rank_1=np.where(df['Header_1'] < df['Header_2'],
                                   df['Rank_1'] + eps, df['Rank_1'] - eps)
                  ).iloc[:,[2,3]].idxmin(axis=1)
print(out)

# Output
Index_1    Rank_2
Index_2    Rank_1
Index_3    Rank_1
Index_4    Rank_2
Index_5    Rank_1
Index_6    Rank_2
Index_7    Rank_2
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading