Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use Tiebreaker having nan values while calculating the Rank in pandas

I have a pandas dataframe, where there is a weighted score, and tiebreaker column. We have NaN values in tiebreaker column. The data frame looks like –

Name    Weighted_Score(%)   tie_breaker
   A    12.0    2.7
   B    13.0    2.8
   C    14.0    NaN
   D    14.0    3.2

Now i want to calculate the Rank based on weighted_Score(%), and if weighted_score(%) is same, then use Tie breaker.

In my case, following code is working fine until and unless there are no "NaN" values in tie_breaker column.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['Rank'] = df[['Weighted_Score(%)', 'tie_breaker']].apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')

Above is giving the wrong Rank.

Name    Weighted_Score(%)   tie_breaker Rank
A   12.0    2.7 1
B   13.0    2.8 2
C   14.0    NaN 3
D   14.0    3.2 4

i tried converting the tiebreker nan values to 0.0, still nothing happening.

>Solution :

Your idea to fillna was good, but you would need to use np.inf to ensure the NaNs are ranked last:

df['Rank'] = (df[['Weighted_Score(%)', 'tie_breaker']].fillna(np.inf)
              .apply(tuple, axis=1).rank(method='dense', ascending=True, na_option='bottom').astype('int')
             )

Output:

  Name  Weighted_Score(%)  tie_breaker  Rank
0    A               12.0          2.7     1
1    B               13.0          2.8     2
2    C               14.0          NaN     4
3    D               14.0          3.2     3

Alternatively, you could sort_values and na_position='last', then increment a rank with duplicated and cumsum:

cols = ['Weighted_Score(%)', 'tie_breaker']
df['Rank'] = (~df[cols]
              .sort_values(by=cols, na_position='last')
              .duplicated()
             ).cumsum()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading