Why does static_cast conversion speed up my integer division function?

March 17, 2022

…or rather, why does not static_cast-ing slow down my function?

Consider the function below, which performs integer division:

int Divide(int x, int y) {
  int ret = 0, i = 32;
  long j = static_cast<long>(y) << i;
  while (x >= y) {
    while (x < j) --i, j >>= 1;
    ret += 1 << i, x -= j;
  }
  return ret;
}

This performs reasonably well, as one might expect.
However, if we remove the static_cast on line 3, like so:

int Divide(int x, int y) {
  int ret = 0, i = 32;
  long j = y << i;
  while (x >= y) {
    while (x < j) --i, j >>= 1;
    ret += 1 << i, x -= j;
  }
  return ret;
}

This version performs noticeably slower, sometimes several hundreds times slower (I haven’t measured rigorously, but shouldn’t be far off) for pathological inputs where x is large and y is small. I was curious and wanted to look into why, and tried digging into the assembly code. However, apart from the casting differences on line 3, I get the exact same output. Here’s the line 3 output for reference (source):

With static_cast:

movsxd  rax, dword ptr [rbp - 8]
mov     ecx, dword ptr [rbp - 16]
shl     rax, cl
mov     qword ptr [rbp - 24], rax

Without static_cast:

mov     eax, dword ptr [rbp - 8]
mov     ecx, dword ptr [rbp - 16]
shl     eax, cl
cdqe
mov     qword ptr [rbp - 24], rax

The rest is identical.

I’m really curious where the overhead is occurring.

>Solution :

It’s very unlikely at the time of writing that your int is any larger than a 32 bit type.

The behaviour of a bitwise left shift << of any more than 31 places on a 32 bit signed type is undefined.

So y << i is undefined without first promoting the first argument to a long type – and even then you’re assuming that long is wider than 32 bits (currently on Windows platforms it’s the same size as int).

It appears in the second assembly snippet that the compiler is getting confused and making a mess of things with machine level sign extension instructions.