I have a dataframe with multiple columns. I want to create boolean column, flagging if the value belongs to 90th percentile and above.
My data frame also contains multiple zeros.
Example:
Name Value
Val1 1000
Val2 910
Val3 800
Val4 700
Val5 600
Val6 500
Val7 400
Val8 300
Val9 200
Val10 100
Val11 0
Expected output
Name Value 90thper
Val1 1000 1
Val2 910 1
Val3 800 0
Val4 700 0
Val5 600 0
Val6 500 0
Val7 400 0
Val8 300 0
Val9 200 0
Val10 100 0
Val11 0 0
>Solution :
You could use pd.Series.quantile to find the 90th percentile value and include all values above it.
val = df['Value'].quantile(.9, interpolation="lower") # val -> 910
df['90thper'] = df['Value'].ge(val).astype(int)
# Name Value 90thper
# 0 Val1 1000 1
# 1 Val2 910 1
# 2 Val3 800 0
# 3 Val4 700 0
# 4 Val5 600 0
# 5 Val6 500 0
# 6 Val7 400 0
# 7 Val8 300 0
# 8 Val9 200 0
# 9 Val10 100 0