I have three numpy arrays and I would like to compute two of them based on values from the third.
Specifically, I have the following arrays:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([0, 1, 2, 3, 4])
As a result, I would like to get a new array which is the difference of b and c when a is less than or equal to 3, and which is the sum of b and c when a is greater than 3.
This works easily via a loop, however I need code that is faster. I have already tried it via np.where, but unfortunately that is not faster for me either.
import timeit
t = timeit.default_timer()
for _ in range(10000):
d1 = np.zeros(5)
for i in range(5):
if a[i] <= 3:
d1[i] = b[i] - c[i]
else:
d1[i] = b[i] + c[i]
print(f"Time: {timeit.default_timer() - t} s") # prints: 0.025416199998289812 s
t = timeit.default_timer()
for _ in range(10000):
d2 = np.where(a <= 3, b - c, b + c)
print(f"Time: {timeit.default_timer() - t} s") # prints: 0.02637680000043474 s
Am I using np.where wrong or are there other ways to make this code faster?
>Solution :
You are comparing on a too small dataset. Thus you’re mostly measuring numpy’s overhead.
Here is the same comparison on arrays of 50k items:
# for loop
18.6 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# numpy.where
91.8 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)