Why is the numpy generation of random numbers so much slower in the case of repeated calls compared to a single function call?
Example:
import numpy as np
import timeit
if __name__ == '__main__':
latency_normal = timeit.timeit('np.random.uniform(size=(100,))', setup = 'import numpy as np')
latency_normal_loop = timeit.timeit('[np.random.uniform(size=1) for _ in range(100)]', setup = 'import numpy as np')
rng = np.random.default_rng()
latency_generator = timeit.timeit('rng.uniform(size=(100,))', setup = 'import numpy as np')
latency_generator_loop = timeit.timeit('[rng.uniform(size=1) for _ in range(100)]', setup = 'import numpy as np')
print("latencies:\t normal: [{}, {}]\t generator: [{},{}]".format(latency_normal, latency_normal_loop, latency_generator, latency_generator_loop))
Output:
latencies: normal: [2.7388298519999807, 31.694285575999857] generator: [2.6634575979996953,31.0009219450003]
Are there any alternatives that performs better for repeated calls with smaller sample sizes?
>Solution :
Obviously there is a large fixed per-call cost associated with the function call. To work around it, you can make a wrapper that will retrieve a batch of random numbers from numpy (i.e. 100) in a single call and then return values from this cache. When the cache gets depleted, it will ask numpy for another 100 numbers, etc.
Or, you can simply use Python’s random
!