Suppose we have the following dataframe and program logic
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'B': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]})
def more_than(series, threshold=5):
try:
trues = series.value_counts()[True]
p = trues / len(series) * 100
except KeyError:
p = 0
return True if p > threshold else False
df['compare'] = df['A'] > 5
print(more_than(df['compare']))
# True here
I’d like to have function similar to all(...) but with the possibility of a threshold (like above). It works as it should but I wondered if there’s anything inbuilt and probably faster here.
>Solution :
You can use:
(df['A'].gt(5).mean()*100)>5
Output: True
Intermediates:
# df['A'].gt(5)
[False, False, False, False, False, True, True, True, True, True]
# implicit conversion to integer
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
# average
0.5