I recently wrote a program that does some floating point calculations in Arm64 Assembly.
Since the numbers I’m dealing with can become really tiny, I now want to optimise the code so that it uses as much precision as possible.
I found out the NEON engine has 128-bit floating point registers instead of the 64 bits I’m currently working with, so I searched a way to use these for calculations. Every website I looked at tells me this should be possible, but when I try to do something like
fmul v0, v1, v2
I just get "error: invalid operand for instruction".
I’m using the M1 chip that should be capable of working with NEON instructions, and when I change it to
fmul v0.2d, v1.2d, v2.2d
there’s no problem at all.
Does anyone have an idea what I’m doing wrong? Or is it just impossible to use all the 128 bits of these registers at once?
True, the NEON registers are 128bit wide, but the maximum data type width is 64.
No consumer architecture known to me is capable of handling any 128bit data type.
PS : Is there a quad data type to begin with? I’m curious.