Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Z-Score as measurement of diverging values

I’ve been trying to use the z-score to filter out odd values in python. For the calculation I’ve used the version scipy is offering, vs calculating it myself using numpy and the mean and std functions. The result is the same. I thought a p-Value of -1 to 1 should result in 68,1% of the samples, or maybe I’ve got the concept wrong and it solely is representative of the values itself.

However, here is the example where I’d expect an output of closer to 0.682 not 0.57.

import numpy
from scipy import stats

arr = numpy.array(range(1, 1000))

col_z_score = stats.zscore(arr)

print((~numpy.bitwise_or(-1 >= col_z_score, 1 <= col_z_score)).sum() / len(col_z_score))
print((numpy.bitwise_and(1 >= col_z_score, -1 <= col_z_score)).sum() / len(col_z_score))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The 68,1% rule works with normal distributions.

arr = np.array(range(1, 1000)) follows the uniform distribution, hence the 57%.

To generate a normal distribution you can use this:

arr = np.random.normal(0, 1, 1000)

Also, bitwise_or or bitwise_and are wrong in this case, you should use logical_or or logical_and:

within_range = np.logical_and(col_z_score >= -1, col_z_score <= 1)

proportion_within_range = within_range.sum() / len(col_z_score)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading