I’d like to calculate mean value of each column but only consider values are within range of quantile, like 20%-80% only.
Here is why I have done, but not complete
df=pd.DataFrame({"A":[1,1,20,2,2,3,50,7,8,15,20,35,50,70],"B":[10,100,20,20,200,30,50,70,80,150,200,350,500,700]})
df
A B
0 1 10
1 1 100
2 20 20
3 2 20
4 2 200
5 3 30
6 50 50
7 7 70
8 8 80
9 15 150
10 20 200
11 35 350
12 50 500
13 70 70
then find q20 and 180 for each column using np.quantile()
q20=np.quantile(df,0.2,axis=0)
q20
array([ 2., 26.])
q80=np.quantile(df,0.8,axis=0)
q80
array([ 41., 260.])
Now How do I filter the values between q20 and q80 for each column
I am doing below then I got an error
mask=(a>q20)&(a<q80)
TypeError: Cannot compare a Categorical for op __gt__ with type <class 'numpy.ndarray'>.
If you want to compare values, use 'np.asarray(cat) <op> other'.
Thanks for your help
>Solution :
use rank with pct
out = df[df.rank(pct=True).ge(0.2) & df.rank(pct=True).le(0.8)].mean()
out:
A 12.444444
B 110.000000
dtype: float64