I have the following dataframe:
frame=pd.DataFrame(index=["Index_1","Index_2","Index_3","Index_4","Index_5","Index_6","Index_7"],
columns=["Header_1","Header_2","Rank_1","Rank_2"], data=[[4.06,4.34,1,1],[3.30,2.35,2,3],
[3.26,1.02,3,7],[3.14,2.40,4,2],[2.61,1.81,5,5],[2.52,2.33,6,4],[2.36,1.62,7,6]])
which results in:
Header_1 Header_2 Rank_1 Rank_2
Index_1 4.06 4.34 1 1
Index_2 3.30 2.35 2 3
Index_3 3.26 1.02 3 7
Index_4 3.14 2.40 4 2
Index_5 2.61 1.81 5 5
Index_6 2.52 2.33 6 4
Index_7 2.36 1.62 7 6
The Rank columns (_1,_2) show the ranking of the values (say returns) in the header columns (_1,_2) when the latter are sorted in descending order. For example at Index_2 , Header 1 – we have the second highest value of 3.3 within Header_1 column , therefore the rank of 2 at column Rank_1.
For each Index (_1,_2,_3,_4….) I want to map either Rank_1 or Rank_2 depending on where the minimum value is; In essence this achieved via pandas.DataFrame.idxmin.
If I use: frame.iloc[:,[2,3]].idxmin(axis=1) I arrive at:
Index_1 Rank_1
Index_2 Rank_1
Index_3 Rank_1
Index_4 Rank_2
Index_5 Rank_1
Index_6 Rank_2
Index_7 Rank_2
This is more or less the desired output; yet when there are equal ranks in Rank_1 and Rank_2 columns, such as at Index_1 or Index_5 I want the algorithm to check also the two Header columns and make final decision based on the highest value from Header_1 or Header_2. So at Index_1 we see that the ranks are equal (1); however Header_2 contains the higher value (4.34) compared to Header_1 (4.06). Therefore I would expect the final output to be as follows:
Index_1 Rank_2
Index_2 Rank_1
Index_3 Rank_1
Index_4 Rank_2
Index_5 Rank_1
Index_6 Rank_2
Index_7 Rank_2
Note: the difference at the last two outputs is just at Index_1.
>Solution :
Add or subtract an epsilon value to Rank_1 according to Header_1 < Header_2 or not:
eps = 0.000001
out = frame.assign(Rank_1=np.where(df['Header_1'] < df['Header_2'],
df['Rank_1'] + eps, df['Rank_1'] - eps)
).iloc[:,[2,3]].idxmin(axis=1)
print(out)
# Output
Index_1 Rank_2
Index_2 Rank_1
Index_3 Rank_1
Index_4 Rank_2
Index_5 Rank_1
Index_6 Rank_2
Index_7 Rank_2
dtype: object