Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the (n) largest values from a pandas data frame? And label them as '1' else '0'

I have the following data frame.

import pandas as pd

date = pd.date_range('10/01/2018', periods=5, freq='D')

close = pd.DataFrame({
    'ABC': [1, 5, 3, 6, 2],
    'EFG': [12, 51, 43, 56, 22],
    'XYZ': [35, 36, 36, 36, 37],
}, [date])

And I want to get the full data frame with ‘1’ for the largest value in each row and ‘0’ for the other two columns.

If I want to have two largest values for each row, it should label ‘1’ for those largest values and ‘0’ for others and etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can someone please help me with this?

Thanks in advance!

>Solution :

In pandas, you could go with checking if the pd.DataFrame.rank for each row is above your threshold.

# Number of largest values you want to mark
n = 1

# Solution
out = df.rank(axis=1, method="first", ascending=False).le(n).astype(int)

out:

            ABC  EFG  XYZ
2018-10-01    0    0    1
2018-10-02    0    1    0
2018-10-03    0    1    0
2018-10-04    0    1    0
2018-10-05    0    0    1

Also interesting is pd.DataFrame.nlargest, but it doesn’t help so much with marking the positions of the largest values in the original dataframe.

If you need speed, then I recommend dropping down to numpy and doing some np.argpartition trickery.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading