Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a boolean column in pandas datafame based on percentile values of another column

I have a dataframe with multiple columns. I want to create boolean column, flagging if the value belongs to 90th percentile and above.

My data frame also contains multiple zeros.

Example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Name  Value
Val1  1000
Val2  910
Val3  800
Val4  700
Val5  600
Val6  500
Val7  400
Val8  300
Val9  200
Val10 100
Val11 0

Expected output

Name  Value 90thper
Val1  1000    1
Val2  910     1
Val3  800     0
Val4  700     0
Val5  600     0
Val6  500     0
Val7  400     0
Val8  300     0
Val9  200     0
Val10 100     0
Val11 0       0

>Solution :

You could use pd.Series.quantile to find the 90th percentile value and include all values above it.

val = df['Value'].quantile(.9, interpolation="lower") # val -> 910
df['90thper'] = df['Value'].ge(val).astype(int)

#     Name  Value  90thper
# 0   Val1   1000        1
# 1   Val2    910        1
# 2   Val3    800        0
# 3   Val4    700        0
# 4   Val5    600        0
# 5   Val6    500        0
# 6   Val7    400        0
# 7   Val8    300        0
# 8   Val9    200        0
# 9  Val10    100        0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading