Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why does static_cast conversion speed up my integer division function?

…or rather, why does not static_cast-ing slow down my function?

Consider the function below, which performs integer division:

int Divide(int x, int y) {
  int ret = 0, i = 32;
  long j = static_cast<long>(y) << i;
  while (x >= y) {
    while (x < j) --i, j >>= 1;
    ret += 1 << i, x -= j;
  }
  return ret;
}

This performs reasonably well, as one might expect.
However, if we remove the static_cast on line 3, like so:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

int Divide(int x, int y) {
  int ret = 0, i = 32;
  long j = y << i;
  while (x >= y) {
    while (x < j) --i, j >>= 1;
    ret += 1 << i, x -= j;
  }
  return ret;
}

This version performs noticeably slower, sometimes several hundreds times slower (I haven’t measured rigorously, but shouldn’t be far off) for pathological inputs where x is large and y is small. I was curious and wanted to look into why, and tried digging into the assembly code. However, apart from the casting differences on line 3, I get the exact same output. Here’s the line 3 output for reference (source):

With static_cast:

movsxd  rax, dword ptr [rbp - 8]
mov     ecx, dword ptr [rbp - 16]
shl     rax, cl
mov     qword ptr [rbp - 24], rax

Without static_cast:

mov     eax, dword ptr [rbp - 8]
mov     ecx, dword ptr [rbp - 16]
shl     eax, cl
cdqe
mov     qword ptr [rbp - 24], rax

The rest is identical.

I’m really curious where the overhead is occurring.

>Solution :

It’s very unlikely at the time of writing that your int is any larger than a 32 bit type.

The behaviour of a bitwise left shift << of any more than 31 places on a 32 bit signed type is undefined.

So y << i is undefined without first promoting the first argument to a long type – and even then you’re assuming that long is wider than 32 bits (currently on Windows platforms it’s the same size as int).

It appears in the second assembly snippet that the compiler is getting confused and making a mess of things with machine level sign extension instructions.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading