Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

sse4 packed sum between int32_t and int16_t (sign extend to int32_t)

I have the following code snippet (a gist can be found here) where I am trying to do a sum between 4 int32_t negative values and 4 int16_t values (that will be sign extend to int32_t).

    extern  exit

    global _start

    section .data

a:     dd -76, -84, -84, -132
b:     dw 406, 406, 406, 406
    
    section .text
_start:
    movdqa xmm0, [a]
    pmovsxwd xmm2, [b]
    paddq xmm0, xmm2
    ;Expected: 330, 322, 322, 274
    ;Results:  330, 323, 322, 275
    call exit

However, when going through my debugger, I couldn’t understand why the output results are different from the expected results. Any idea ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

paddq does 64-bit qword chunks, so there’s carry across two of the 32-bit boundaries, leading to an off-by-one in the high half of each qword.

paddd is 32-bit dword chunks, matching the pmovsxwd dword element destination size. This is a SIMD operation with 4 separate adds, independent of each other.


BTW, you could have made this more efficient by folding the 16-byte aligned load into a memory operand for padd, but yeah for debugging it can help to see both inputs in registers with a separate load.

  default rel           ; use RIP-relative addressing modes when possible

_start:
   movsxwd xmm0, [b]
   paddd   xmm0, [a]

Also you’d normally put read-only arrays in section .rodata.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading