I have a boolean mask value assigning problem the requires efficient boolean mask operation.
It’s a multi-dimension mask and i’m using
einsum to achieve the result, but the operation is not very efficient, and i’m wondering, if i can get some help with it
Here is my current solution: (both
mask, truth_value, false_value are dummy data with dtype and shape matches to my problem.
mask = np.random.randn(1000, 50)> 0.5 truth_value = np.random.randn(50, 10) false_value = np.random.randn(10) objective = np.einsum('ij,jk->ijk', mask, truth_value) + np.einsum('ij,k->ijk', ~mask, false_value)
Is there any faster way to get
mask, truth_value, false_value ?
While i was waiting, figured out a faster way
objective = np.where(mask[...,np.newaxis], np.broadcast_to(truth_value, (1000, 50, 10)), np.broadcast_to(false_value, (1000, 50, 10)))
But is there any faster alternative ?
You can use the Numba JIT to do that more efficiently.
import numpy as np import numba as nb @nb.njit('float64[:,:,::1](bool_[:,::1], float64[:,::1], float64[::1])') def blend(mask, truth_value, false_value): n, m = mask.shape l = false_value.shape assert truth_value.shape == (m, l) result = np.empty((n, m, l), dtype=np.float64) for i in range(n): for j in range(m): if mask[i, j]: result[i, j, :] = truth_value[j, :] else: result[i, j, :] = false_value[:] return result mask = np.random.randn(1000, 50) > 0.5 truth_value = np.random.randn(50, 10) false_value = np.random.randn(10) objective = blend(mask, truth_value, false_value)
The computation of
objective is 4.8 times faster on my machine.
If this is not fast enough, you can try to parallelize the code using the parameter
parallel=True and using
nb.prange instead of
range in the i-based loop. This may not be faster due to the overhead of creating new threads. On my machine (with 6 cores), the parallel version is 7.4 times faster (the creation of threads is pretty expensive compared to the execution time).
Another possible optimization is to write directly the result in a buffer allocated ahead of time (this is only better if you call this function multiple times with the same array size).
Here are the overall timings on my machine:
np.einsum: 4.32 ms np.where: 1.72 ms numba sequential: 0.89 ms numba parallel: 0.58 ms