Why do none of the major compilers optimize this conditional move?

I stumbled across this Reddit post which is a joke on the following code snippet,

void f(int& x) {
    if (x != 1) {
        x = 1;
void g(int& x) {
    x = 1;

saying that the two functions are not equivalent to ‘the compiler’.
I was convinced that any of the major C++ compilers would optimize the conditional assignment away to an unconditional store, thus emitting the same assembly code for f and g.

However, they don’t.

Can anyone explain to me why this is the case?

What I am thinking is this: The unconditional store would most likely be faster, because we have to access memory anyway to read the value for the comparison and the branching code puts stress on the branch predictor. Also stores should not be considered side effects by the compiler (AFAIK), even though subsequent memory accesses might be faster or slower depending on wether the branch in f was taken or not, due to cache locality.

So are the compilers just not able to figure this out? While the equivalence of f and g might not necessarily be trivial to prove, I feel like there are much harder problems that these compilers are able to solve. So am I maybe wrong and these function are not equivalent after all, or what is going on here?

>Solution :

It wouldn’t be safe for static const int val = 1; living in read-only memory. The unconditional-store version will segfault trying to write to read-only memory.

The version that checks first is safe to call on that object in the C++ abstract machine (via const_cast), so the optimizer has to respect the possibility that any object that’s not written to was originally const and in read-only memory.

That also wouldn’t be thread-safe. In general the compiler must not invent writes to objects the abstract machine doesn’t write, in case another thread is also writing it and we’d step on the value. (Except atomic RMWs are safe, like a compare-exchange.)

Since we already read the object, we could maybe assume no other thread writes since that would already be data-race UB with our unconditional read.

If many threads are running this code on the same object, unconditional writes would be safe on normal CPU architectures, but much slower (contention for MESI exclusive ownership of the cache line, vs. shared.)

Dirtying a cache line is also something that might not be desirable.

(And safe only because they’re all storing the same value. If even one thread was storing a different value, it might have that store overwritten if it happened to not be last in the modification order as determined by order of CPUs getting ownership of the cache line to commit their stores.)

Leave a Reply