Pandas all() but with a threshold

April 17, 2024

Suppose we have the following dataframe and program logic

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'B': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]})


def more_than(series, threshold=5):
    try:
        trues = series.value_counts()[True]
        p = trues / len(series) * 100
    except KeyError:
        p = 0
    
    return True if p > threshold else False

df['compare'] = df['A'] > 5

print(more_than(df['compare']))

# True here

I’d like to have function similar to all(...) but with the possibility of a threshold (like above). It works as it should but I wondered if there’s anything inbuilt and probably faster here.

>Solution :

You can use:

(df['A'].gt(5).mean()*100)>5

Output: True

Intermediates:

# df['A'].gt(5)
[False, False, False, False, False, True, True, True, True, True]

# implicit conversion to integer
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

# average
0.5