Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

For a binary array sum(array) and numpy.count_nonzero(array) give different answers for big arrays, when array is uint8. Why?

I have 3D arrays filled with ones and zeros (created through pyellipsoid). The array is uint8.
I wanted to know the number of 1s.
I used sum(sum(sum(array))) to do this and it worked fine for small arrays (up to approx.5000 entries).
I compared sum(sum(sum(array))) to numpy.count_nonzero(array) for a known number of nonzero entries.
For bigger arrays the answers from "sum" are always wrong and lower than they should be.

If I use float64 arrays it works fine with big arrays.
If I change the data type to uint8 it does not work.

Why is that?
I am sure there is a very simple reason, but I can’t find an answer.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Small array example:

test = numpy.zeros((2,2,2))
test[0,0,0] = 1  
test[1,0,0] = 1
In: test
Out: 
array([[[1., 0.],
        [0., 0.]],
In: sum(sum(sum(test)))
Out: 2.0

Big example (8000 entries, only one zero, 7999 ones):

test_big=np.ones((20,20,20))
test_big[0,0,0] = 0
test_big
Out[77]: 
array([[[0., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       ...,

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]]])
In: sum(sum(sum(test_big)))
Out: 7999.0

So far so good. Here, the data type of the sum output is float64. But if I now change the data type of the array to the type that is used with pyellipsoid (uint8)…

In: test_big = test_big.astype('uint8')
In: sum(sum(sum(test_big)))
Out: 2879

So obviously 2879 is not 7999. Here, the data type of the sum output is int32 (-2147483648 to 2147483647) so this should be big enough for 7999, right…?
I guess it has something to do with the data type, but how? Why?

Any answer would be appreciated. This is not urgent. I am just curious what I am missing.
(It’s my first post, so I hope this is understandable).
Thanks!

(I am using spyder in anaconda on windows if that is of any help.)

>Solution :

The issue is as You guessed – there is an integer overflow. If You take a look at sum(sum(test_big)) You will notice that the values are wrong there.

The part that is wrong is that integer overflow can occur in You sum() functions which are taking the partial sums

What I would suggest is making a sum of this array using np.sum() as it does give an appropriate sum despite of data type

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading