I couldn’t find any explanation to this, the ndarray.astype()
returns a new array, so I was expecting it to be faster with np.float16 in comparison to np.float32 to since it allocates less memory. However it takes more than double the time.
original_array = np.ones([10,512,1280,3], dtype=np.uint8)
Here are the results :
%%timeit -r 10
float16_array = original_array.astype(np.float16)
93.5 ms ± 1.68 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
%%timeit -r 10
float32_array = original_array.astype(np.float32)
41.4 ms ± 278 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
>Solution :
Your CPU probably has an instruction that numpy can use to do the uint8->float32 conversion (for instance on x86, CVTDQ2PS
in SSE2/AVX/AVX512 would work to do between four and sixteen conversions in a single instruction), but doesn’t have an equivalent instruction for float16. Half-precision float support is relatively sparse outside of GPUs.