I have a pandas dataframe, where there is a weighted score, and tiebreaker column. We have NaN values in tiebreaker column. The data frame looks like –
Name Weighted_Score(%) tie_breaker
A 12.0 2.7
B 13.0 2.8
C 14.0 NaN
D 14.0 3.2
Now i want to calculate the Rank based on weighted_Score(%), and if weighted_score(%) is same, then use Tie breaker.
In my case, following code is working fine until and unless there are no "NaN" values in tie_breaker column.
df['Rank'] = df[['Weighted_Score(%)', 'tie_breaker']].apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')
Above is giving the wrong Rank.
Name Weighted_Score(%) tie_breaker Rank
A 12.0 2.7 1
B 13.0 2.8 2
C 14.0 NaN 3
D 14.0 3.2 4
i tried converting the tiebreker nan values to 0.0, still nothing happening.
>Solution :
Your idea to fillna was good, but you would need to use np.inf to ensure the NaNs are ranked last:
df['Rank'] = (df[['Weighted_Score(%)', 'tie_breaker']].fillna(np.inf)
.apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')
)
Output:
Name Weighted_Score(%) tie_breaker Rank
0 A 12.0 2.7 1
1 B 13.0 2.8 2
2 C 14.0 NaN 4
3 D 14.0 3.2 3
Alternatively, you could sort_values and na_position='last', then increment a rank with duplicated and cumsum:
cols = ['Weighted_Score(%)', 'tie_breaker']
df['Rank'] = (~df[cols]
.sort_values(by=cols, na_position='last')
.duplicated()
).cumsum()