Slight odd situation I had but I wish to use the BSF/BSR instruction inside CUDA. Just wondering if there’s any way to run this instruction in CUDA without a lot of overhead.
>Solution :
The list of integer intrinsics is available in the documentation. You can use the __clz intrinsic for example to mimic BSR. For BSF, I think __ffs should do most of the job.