Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to calculate mean value of each column within range of quantile?

I’d like to calculate mean value of each column but only consider values are within range of quantile, like 20%-80% only.

Here is why I have done, but not complete

df=pd.DataFrame({"A":[1,1,20,2,2,3,50,7,8,15,20,35,50,70],"B":[10,100,20,20,200,30,50,70,80,150,200,350,500,700]})
df
    A   B
0   1   10
1   1   100
2   20  20
3   2   20
4   2   200
5   3   30
6   50  50
7   7   70
8   8   80
9   15  150
10  20  200
11  35  350
12  50  500
13  70  70

then find q20 and 180 for each column using np.quantile()

q20=np.quantile(df,0.2,axis=0)
q20

array([ 2., 26.])

q80=np.quantile(df,0.8,axis=0)
q80

array([ 41., 260.])

Now How do I filter the values between q20 and q80 for each column

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am doing below then I got an error

mask=(a>q20)&(a<q80)

TypeError: Cannot compare a Categorical for op __gt__ with type <class 'numpy.ndarray'>.
If you want to compare values, use 'np.asarray(cat) <op> other'.

Thanks for your help

>Solution :

use rank with pct

out = df[df.rank(pct=True).ge(0.2) & df.rank(pct=True).le(0.8)].mean()

out:

A     12.444444
B    110.000000
dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading