Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Collapse __mask64 aka 64-bit integer value, counting nibbles that have all bits set?

I have a __mask64 as a result of a few AVX512 operations:

__mmask64 mboth = _kand_mask64(lres, hres);

I would like to count the number of nibbles in this that have all bits set (0xF).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The simple solution is to do this:

uint64 imask = (uint64)mboth;
while (imask) {
    if (imask & 0xf == 0xf)
        ret++;
    imask = imask >> 4;
} 

I wanted something better, but what I came up with doesn’t feel elegant:

    //outside the loop
    __m512i b512_1s = _mm512_set1_epi32(0xffffffff);
    __m512i b512_0s = _mm512_set1_epi32(0x00000000);

    //then...
    __m512i vboth = _mm512_mask_set1_epi8(b512_0s, mboth, 0xff);
    __mmask16 bits = _mm512_cmpeq_epi32_mask(b512_1s, vboth);
    ret += __builtin_popcount((unsigned int)fres);

The above puts a 0xff byte into a vector where a 1 bit exists in the mask, then gets a 1-bit in the bits mask when the blown-up 0xf nibbles now are now found as 0xffffffff int32‘s.

I feel that two 512-bit operations are way overkill when the original data lives in a 64-bit number. This alternate is probably much worse; it’s too many instructions and still operates on 128 bits:

    //outside the loop
    __m128i b128_1s = _mm_set1_epi32(0xffffffff);

    //then...
    uint64 maskl = mboth & 0x0f0f0f0f0f0f0f0f;
    uint64 maskh = mboth & 0xf0f0f0f0f0f0f0f0;
    uint64 mask128[2] = { (maskl << 4) | maskl, (maskh >> 4) | maskh };
    __m128i bytes   = _mm_cmpeq_epi8(b128_1s, *(__m128i*)mask128);
    uint bits = _mm_movemask_epi8(bytes);
    ret += __builtin_popcount(bits);

>Solution :

With just some scalar operations you can do this:

imask &= imask >> 2;
imask &= imask >> 1;
ret += std::popcount(imask & 0x1111111111111111);

The first two steps put, for every nibble, the horizontal AND of the bits of that nibble in the least significant bit of that nibble. The other bits of the nibble become something that we don’t want here so we just mask them out. Then popcount the result.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading