Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Performance of numpy all/any vs testing a single element

I create an array that does not contain a single zero (let’s ignore that it does, with zero probability, as np.random.rand() samples [0,1) uniformly). I want to check whether all values are equal to zero (for some other purpose the arrays may contain all zeros). Below are some timings.

Surprisingly to me, checking a single (nonzero) element is about 2000 times faster than using np.all() or np.any(). I would assume that the compiler internally replaces np.all() by np.any() of the inverse condition and that np.any()/np.all() returns True/False at the first instance that the condition is fulfilled/violated (i.e. the compiler does not create the entire array of True or False values first).

How comes np.all() or np.any() are that much slower when it would only have to check one element? Or is this because of the external knowledge I put that the array does not contain all zeros? In the case of an all-zeros array, I guess it might be too slow to do the boolean comparison separately for each element. I don’t know about the performance of the underlying low-level algorithms, but each element needs to be accessed once independent of whether it goes one by one or creates the whole boolean array once.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import numpy as np

np.random.seed(100)
a = np.random.rand(10418,144)
%timeit a[0,0] == 0
%timeit (a == 0).all()
%timeit np.all(a == 0)
%timeit (a != 0).any()
%timeit np.any(a != 0)

# 400 ns ± 2.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 713 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 720 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 711 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 723 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>Solution :

When you write a == 0, numpy creates a new array of type boolean, compares each element in a with 0 and stores the result in the array. This allocation, initialization, and subsequent deallocation is the reason for the high cost.

Note that you don’t need the explicit a == 0 in the first place. Integers that are zero always evauate to False, nonzero integers to True. np.all(a) is equivalent to np.all(a != 0). So np.all(a==0) is equivalent to not np.any(a)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading