Home How to get the (n) largest values from a pandas data frame? And label them as '1' else '0'

Questions

How to get the (n) largest values from a pandas data frame? And label them as '1' else '0'

April 22, 2023

I have the following data frame.

import pandas as pd

date = pd.date_range('10/01/2018', periods=5, freq='D')

close = pd.DataFrame({
    'ABC': [1, 5, 3, 6, 2],
    'EFG': [12, 51, 43, 56, 22],
    'XYZ': [35, 36, 36, 36, 37],
}, [date])

And I want to get the full data frame with ‘1’ for the largest value in each row and ‘0’ for the other two columns.

If I want to have two largest values for each row, it should label ‘1’ for those largest values and ‘0’ for others and etc.

Can someone please help me with this?

Thanks in advance!

>Solution :

In pandas, you could go with checking if the pd.DataFrame.rank for each row is above your threshold.

# Number of largest values you want to mark
n = 1

# Solution
out = df.rank(axis=1, method="first", ascending=False).le(n).astype(int)

out:

            ABC  EFG  XYZ
2018-10-01    0    0    1
2018-10-02    0    1    0
2018-10-03    0    1    0
2018-10-04    0    1    0
2018-10-05    0    0    1

Also interesting is pd.DataFrame.nlargest, but it doesn’t help so much with marking the positions of the largest values in the original dataframe.

If you need speed, then I recommend dropping down to numpy and doing some np.argpartition trickery.