I have the following data frame.
import pandas as pd
date = pd.date_range('10/01/2018', periods=5, freq='D')
close = pd.DataFrame({
'ABC': [1, 5, 3, 6, 2],
'EFG': [12, 51, 43, 56, 22],
'XYZ': [35, 36, 36, 36, 37],
}, [date])
And I want to get the full data frame with ‘1’ for the largest value in each row and ‘0’ for the other two columns.
If I want to have two largest values for each row, it should label ‘1’ for those largest values and ‘0’ for others and etc.
Can someone please help me with this?
Thanks in advance!
>Solution :
In pandas, you could go with checking if the pd.DataFrame.rank for each row is above your threshold.
# Number of largest values you want to mark
n = 1
# Solution
out = df.rank(axis=1, method="first", ascending=False).le(n).astype(int)
out:
ABC EFG XYZ
2018-10-01 0 0 1
2018-10-02 0 1 0
2018-10-03 0 1 0
2018-10-04 0 1 0
2018-10-05 0 0 1
Also interesting is pd.DataFrame.nlargest, but it doesn’t help so much with marking the positions of the largest values in the original dataframe.
If you need speed, then I recommend dropping down to numpy and doing some np.argpartition trickery.