Why is casting np.uint8 array to np.float16 array is slower than casting np.uint8 to np.float32 array?

I couldn’t find any explanation to this, the ndarray.astype() returns a new array, so I was expecting it to be faster with np.float16 in comparison to np.float32 to since it allocates less memory. However it takes more than double the time.

original_array = np.ones([10,512,1280,3], dtype=np.uint8) 

Here are the results :

%%timeit -r 10
float16_array = original_array.astype(np.float16)

93.5 ms ± 1.68 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

%%timeit -r 10
float32_array = original_array.astype(np.float32)

41.4 ms ± 278 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

>Solution :

Your CPU probably has an instruction that numpy can use to do the uint8->float32 conversion (for instance on x86, CVTDQ2PS in SSE2/AVX/AVX512 would work to do between four and sixteen conversions in a single instruction), but doesn’t have an equivalent instruction for float16. Half-precision float support is relatively sparse outside of GPUs.

Leave a Reply