Use Tiebreaker having nan values while calculating the Rank in pandas

October 11, 2023

I have a pandas dataframe, where there is a weighted score, and tiebreaker column. We have NaN values in tiebreaker column. The data frame looks like –

Name    Weighted_Score(%)   tie_breaker
   A    12.0    2.7
   B    13.0    2.8
   C    14.0    NaN
   D    14.0    3.2

Now i want to calculate the Rank based on weighted_Score(%), and if weighted_score(%) is same, then use Tie breaker.

In my case, following code is working fine until and unless there are no "NaN" values in tie_breaker column.

df['Rank'] = df[['Weighted_Score(%)', 'tie_breaker']].apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')

Above is giving the wrong Rank.

Name    Weighted_Score(%)   tie_breaker Rank
A   12.0    2.7 1
B   13.0    2.8 2
C   14.0    NaN 3
D   14.0    3.2 4

i tried converting the tiebreker nan values to 0.0, still nothing happening.

>Solution :

Your idea to fillna was good, but you would need to use np.inf to ensure the NaNs are ranked last:

df['Rank'] = (df[['Weighted_Score(%)', 'tie_breaker']].fillna(np.inf)
              .apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')
             )

Output:

  Name  Weighted_Score(%)  tie_breaker  Rank
0    A               12.0          2.7     1
1    B               13.0          2.8     2
2    C               14.0          NaN     4
3    D               14.0          3.2     3

Alternatively, you could sort_values and na_position='last', then increment a rank with duplicated and cumsum:

cols = ['Weighted_Score(%)', 'tie_breaker']
df['Rank'] = (~df[cols]
              .sort_values(by=cols, na_position='last')
              .duplicated()
             ).cumsum()