Say I have a NumPy array of float
s, there are positive values and negative values. I have two numbers, say they are a
and b
, a <= b
and [a, b]
is a (closed) number range.
I want to make all of the array fall into the range [a, b]
, more specifically I want to replace all values outside of the range with the corresponding terminal value.
I am not trying to scale values to fit numbers into a range, in Python that would be:
[a + (e - a) / (b - a) for e in arr]
Or in NumPy:
a + (arr - a) / (b - a)
I am trying to replace all values lower than a
with a
and all values higher than b
with b
, while leaving all other values unchanged, I can do it in a single list comprehension in Python:
[e if a <= e <= b else (a if e < a else b) for e in arr]
I can do the same with two broadcasts:
arr[arr < a] = a
arr[arr > b] = b
Even though NumPy is way faster than Python, the above is two loops, not one, the method is inefficient but compiled.
What is a faster way?
I have done the measurement, multiple times, and Python is indeed much slower as expected:
In [1]: import numpy as np
In [2]: numbers = np.random.random(4096) * 1024
In [3]: %timeit numbers[numbers < 256]
16.1 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit numbers[numbers > 512]
20.9 µs ± 526 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [5]: %timeit [e if 256 <= e <= 512 else (256 if e < 256 else 512) for e in numbers]
927 µs ± 101 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [6]: %timeit [e if 256 <= e <= 512 else (256 if e < 256 else 512) for e in numbers.tolist()]
684 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>Solution :
You can use the np.clip
Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified, values smaller than 0 become 0, and values larger than 1 become 1.
It is faster the the broadcasts way.
Code Example:
import numpy as np
arr = np.array([-3, 5, 10, -7, 2, 8, -12, 15])
a = 0
b = 10
new_arr = np.clip(arr, a, b)
print(new_arr)
TIME MEASURMENT
For Array size of 1000
Method 1 (List comprehension) time: 0.0115 seconds
Method 2 (NumPy broadcasts) time: 0.0009 seconds
Method 3 (np.clip()) time: 0.0009 seconds
-----------------------------------------------------------------
For Array size of 10000
Method 1 (List comprehension) time: 0.1137 seconds
Method 2 (NumPy broadcasts) time: 0.0069 seconds
Method 3 (np.clip()) time: 0.0017 seconds
-----------------------------------------------------------------
For Array size of 100000
Method 1 (List comprehension) time: 1.3205 seconds
Method 2 (NumPy broadcasts) time: 0.1152 seconds
Method 3 (np.clip()) time: 0.0107 seconds
-----------------------------------------------------------------
For Array size of 1000000
Method 1 (List comprehension) time: 13.8250 seconds
Method 2 (NumPy broadcasts) time: 1.0064 seconds
Method 3 (np.clip()) time: 0.1973 seconds