Compute numpy arrays based on another numpy array with each other

February 22, 2023

I have three numpy arrays and I would like to compute two of them based on values from the third.

Specifically, I have the following arrays:

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([0, 1, 2, 3, 4])

As a result, I would like to get a new array which is the difference of b and c when a is less than or equal to 3, and which is the sum of b and c when a is greater than 3.

This works easily via a loop, however I need code that is faster. I have already tried it via np.where, but unfortunately that is not faster for me either.

import timeit

t = timeit.default_timer()
for _ in range(10000):
    d1 = np.zeros(5)
    for i in range(5):
        if a[i] <= 3:
            d1[i] = b[i] - c[i]
        else:
            d1[i] = b[i] + c[i]
print(f"Time: {timeit.default_timer() - t} s")  # prints: 0.025416199998289812 s

t = timeit.default_timer()
for _ in range(10000):
    d2 = np.where(a <= 3, b - c, b + c)
print(f"Time: {timeit.default_timer() - t} s")  # prints: 0.02637680000043474 s

Am I using np.where wrong or are there other ways to make this code faster?

>Solution :

You are comparing on a too small dataset. Thus you’re mostly measuring numpy’s overhead.

Here is the same comparison on arrays of 50k items:

# for loop
18.6 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# numpy.where
91.8 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)