I want to add 2 unsigned vectors using AVX2

``````__m256i i1 = _mm256_loadu_si256((__m256i *) si1);
__m256i i2 = _mm256_loadu_si256((__m256i *) si2);

``````

however I need to have overflow instead of saturation that `_mm256_adds_epu16` does to be identical with the non-vectorized code, is there any solution for that?

### >Solution :

Use normal binary wrapping `_mm256_add_epi16` instead of saturating `adds`.

Two’s complement and unsigned addition/subtraction are the same binary operation, that’s one of the reasons modern computers use two’s complement. As the asm manual entry for `vpaddw` mentions, the instructions can be used on signed or unsigned integers. (The intrinsics guide entry doesn’t mention signedness at all, so is less helpful at clearing up this confusion.)

Compares like `_mm_cmpgt_epi32` are sensitive to signedness, but math operations (and `cmpeq`) aren’t.

The intrinsics names Intel chose might look like they’re for signed integers specifically, but they always use `epi` or `si` for things that work equally on signed and unsigned elements. But no, `epu` implies a specifically unsigned thing, while `epi` can be specifically signed operations or can be things that work equally on signed or unsigned. Or things where signedness is irrelevant.

For example, `_mm_and_si128` is pure bitwise. `_mm_srli_epi32` is a logical right shift, shifting in zeros, like an unsigned C shift. Not copies of the sign bit, that’s `_mm_srai_epi32` (shift right arithmetic by immediate). Shuffles like `_mm_shuffle_epi32` just move data around in chunks.

Non-widening multiplication like `_mm_mullo_epi16` and `_mm_mullo_epi32` are also the same for signed or unsigned. Only the high-half `_mm_mulhi_epu16` or widening multiplies `_mm_mul_epu32` have unsigned forms as counterparts to their specifically signed `epi16`/`32` forms.

That’s also why 386 only added a scalar integer `imul ecx, esi` form, not also a `mul ecx, esi`, because only the FLAGS setting would differ, not the integer result. And SIMD operations don’t even have FLAGS outputs.

The intrinsics guide unhelpfully describes `_mm_mullo_epi16` as sign-extending and producing a 32-bit product, then truncating to the low 32-bit. The asm manual for `pmullw` also describes it as signed that way, it seems talking about it as the companion to signed `pmulhw`. (And has some bugs, like describing the AVX1 `VPMULLW xmm1, xmm2, xmm3/m128` form as multiplying 32-bit dword elements, probably a copy/paste error from `pmulld`)

And sometimes Intel’s naming scheme is limited, like `_mm_maddubs_epi16` is a u8 x i8 => 16-bit widening multiply, adding pairs horizontally (with signed saturation). I usually have to look up the intrinsic for `pmaddubsw` to remind myself that they named it after the output element width, not the inputs. The inputs have different signedness so if they have to pick one, side, I guess it makes sense to name it for the output, with the signed saturation that can happen with some inputs, like for `pmaddwd`.