…or rather, why does not static_cast-ing slow down my function?
Consider the function below, which performs integer division:
int Divide(int x, int y) {
int ret = 0, i = 32;
long j = static_cast<long>(y) << i;
while (x >= y) {
while (x < j) --i, j >>= 1;
ret += 1 << i, x -= j;
}
return ret;
}
This performs reasonably well, as one might expect.
However, if we remove the static_cast on line 3, like so:
int Divide(int x, int y) {
int ret = 0, i = 32;
long j = y << i;
while (x >= y) {
while (x < j) --i, j >>= 1;
ret += 1 << i, x -= j;
}
return ret;
}
This version performs noticeably slower, sometimes several hundreds times slower (I haven’t measured rigorously, but shouldn’t be far off) for pathological inputs where x is large and y is small. I was curious and wanted to look into why, and tried digging into the assembly code. However, apart from the casting differences on line 3, I get the exact same output. Here’s the line 3 output for reference (source):
With static_cast:
movsxd rax, dword ptr [rbp - 8]
mov ecx, dword ptr [rbp - 16]
shl rax, cl
mov qword ptr [rbp - 24], rax
Without static_cast:
mov eax, dword ptr [rbp - 8]
mov ecx, dword ptr [rbp - 16]
shl eax, cl
cdqe
mov qword ptr [rbp - 24], rax
The rest is identical.
I’m really curious where the overhead is occurring.
>Solution :
It’s very unlikely at the time of writing that your int is any larger than a 32 bit type.
The behaviour of a bitwise left shift << of any more than 31 places on a 32 bit signed type is undefined.
So y << i is undefined without first promoting the first argument to a long type – and even then you’re assuming that long is wider than 32 bits (currently on Windows platforms it’s the same size as int).
It appears in the second assembly snippet that the compiler is getting confused and making a mess of things with machine level sign extension instructions.