Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How numpy arrays are overwritten from interpreter point of view?

I wrote two simple functions to learn CPython’s behaviour regarding numpy arrays.

Python 3.12.1 and numpy version 1.26.2, compiled by mkl (conda default)

def foo():
    for i in range(100):
        H = np.random.rand(1000, 1000)

%timeit -r 100 foo()

def baaz():
    H = np.zeros((1000, 1000))
    for i in range(100):
        H[:, :] = np.random.rand(1000, 1000)

%timeit -r 100 baaz()

Using dis library to see the bytecodes by calling dis.dis(foo()) an dis.dis(baaz()) I get these two outputs.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Initially, I believed that baaz() should run faster than than foo() since we are reusing the H array, instead of de-allocating, and allocating H again on each loop. However, I see consistently that foo() is faster. I am wondering what causes this. I cannot read assembly bytecode, but by simply looking at dis.dis(foo()) and dis.dis(baaz()) output, I can see that foo() generates 13 extra lines compare to baaz().

Dissembled foo():


    671           0 RETURN_GENERATOR
                  2 POP_TOP
                  4 RESUME                   0
    
    676           6 LOAD_CONST               1 (None)
                  8 STORE_FAST               1 (lastline)
    
    677          10 LOAD_FAST                0 (code)
        -->      12 LOAD_ATTR                1 (NULL|self + co_lines)
                 32 CALL                     0
                 40 GET_ITER
            >>   42 FOR_ITER                23 (to 92)
                 46 UNPACK_SEQUENCE          3
                 50 STORE_FAST               2 (start)
                 52 STORE_FAST               3 (end)
                 54 STORE_FAST               4 (line)
    
    678          56 LOAD_FAST                4 (line)
                 58 POP_JUMP_IF_NOT_NONE     1 (to 62)
                 60 JUMP_BACKWARD           10 (to 42)
            >>   62 LOAD_FAST                4 (line)
                 64 LOAD_FAST                1 (lastline)
                 66 COMPARE_OP              55 (!=)
                 70 POP_JUMP_IF_TRUE         1 (to 74)
                 72 JUMP_BACKWARD           16 (to 42)
    
    679     >>   74 LOAD_FAST                4 (line)
                 76 STORE_FAST               1 (lastline)
    
    680          78 LOAD_FAST                2 (start)
                 80 LOAD_FAST                4 (line)
                 82 BUILD_TUPLE              2
                 84 YIELD_VALUE              1
                 86 RESUME                   1
                 88 POP_TOP
                 90 JUMP_BACKWARD           25 (to 42)
    
    677     >>   92 END_FOR
    
    681          94 RETURN_CONST             1 (None)
            >>   96 CALL_INTRINSIC_1         3 (INTRINSIC_STOPITERATION_ERROR)
                 98 RERAISE                  1
    ExceptionTable:
      4 to 58 -> 96 [0] lasti
      62 to 70 -> 96 [0] lasti
      74 to 94 -> 96 [0] lasti

Dissembled baaz():


  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              0 (np)
             12 LOAD_ATTR                3 (NULL|self + zeros)
             32 LOAD_CONST               1 ((1000, 1000))
             34 CALL                     1
             42 STORE_FAST               0 (H)

  3          44 LOAD_GLOBAL              5 (NULL + range)
             54 LOAD_CONST               2 (100)
             56 CALL                     1
             64 GET_ITER
        >>   66 FOR_ITER                43 (to 156)
             70 STORE_FAST               1 (i)

  4          72 LOAD_GLOBAL              0 (np)
             82 LOAD_ATTR                6 (random)
            102 LOAD_ATTR                9 (NULL|self + rand)
            122 LOAD_CONST               3 (1000)
            124 LOAD_CONST               3 (1000)
            126 CALL                     2
            134 LOAD_FAST                0 (H)
            136 LOAD_CONST               0 (None)
            138 LOAD_CONST               0 (None)
            140 BUILD_SLICE              2
            142 LOAD_CONST               0 (None)
            144 LOAD_CONST               0 (None)
            146 BUILD_SLICE              2
            148 BUILD_TUPLE              2
            150 STORE_SUBSCR
            154 JUMP_BACKWARD           45 (to 66)

  3     >>  156 END_FOR
            158 RETURN_CONST             0 (None)

P.S: It may seem not obvious why one would think that baaz() should be faster, but this is indeed the case in a language like Julia Understanding Julia multi-thread / multi-process design.

>Solution :

In both cases you are creating a new array when you do np.random.rand(1000, 1000), and then de-allocating it. In the baaz case, you are also going through the work up updating the initial array. Hence it is slower.

Numpy functions provide a way to avoid this, consider a simple case:

arr[:] = arr + 1

This always creates a new array, which is the result of the expression arr + 1. You could avoid this by using:

np.add(arr, 1, out=arr)

Just a quick example of the above:

In [31]: %%timeit -r 100 arr = np.zeros(1_000_000)
    ...: arr[:] = arr + 1
    ...:
    ...:
1.85 ms ± 375 µs per loop (mean ± std. dev. of 100 runs, 100 loops each)

In [32]: %%timeit -r 100 arr = np.zeros(1_000_000)
    ...: np.add(arr, 1, out=arr)
    ...:
    ...:
418 µs ± 29.1 µs per loop (mean ± std. dev. of 100 runs, 1,000 loops each)

Unfortunately, I don’t think there is anything equivalent for numpy.random functions. Possibly, numba can help you here, not sure how optimized np.random is with it though. But it’s worth taking a look at.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading