Will dereferencing pointers always cause memory access?

June 16, 2023

I wonder whether dereferencing a pointer will always be translated into a machine-level Load/Store instruction, regardless of how optimizing the compiler is.

Suppose we have two threads, one(let’s call it Tom) receives user input and writes a bool variable. The variable is read by another(and this is Jerry) to decide whether to continue a loop. We know that an optimizing compiler may store the variable in a register when compiling the loop. So, at run time, Jerry may read an obsolete value that is different from what Tom actually writes. As a result,we should declare the bool variable as volatile.

However, if dreferencing a pointer will always cause memory access, then the two threads can use a pointer to reference the variable. On every write, Tom will store the new value into memory by dereferencing the pointer and writing to it. On each read, Jerry can really read what Tom wrote by dereferencing that same pointer. This seems better than the implementation-dependent volatile

I’m new to multi-threading programming, so this idea may seem trivial and unnecessary. But I’m really curious about it.

>Solution :

Will dereferencing a pointer always cause memory access?

No, for example:

int five() {
    int x = 5;
    int *ptr = &x;
    return *ptr;
}

Any sane optimizing compiler will not emit a mov from stack memory here, but something along the lines of:

five():
  mov eax, 5
  ret

This is allowed because of the as-if rule.

How do I do inter-thread communication through a `bool*` then?

This is what std::atomic<bool> is for.
You shouldn’t communicate between threads using non-atomic objects, because accessing the same memory through two threads at the same time is undefined behavior in C++. std::atomic makes it thread-safe.
For example:

void thread(std::atomic<bool> &stop_signal) {
    while (!stop_signal) {
        do_stuff();
    }
}

Technically, this doesn’t imply that each load from stop_signal will actually happen. The compiler is allowed to do partial loop unrolling like:

void thread(std::atomic<bool> &stop_signal) {
    // only possible if the compiler knows that do_stuff() doesn't modify stop_signal
    while (!stop_signal) {
        do_stuff();
        do_stuff();
        do_stuff();
        do_stuff();
    }
}

An atomic load() is allowed to observe stale values, so the compiler can assume that four load()s would all read the same value.
Only some operations, like fetch_add() are required to observe the most recent value.
Even then, this optimization might be possible.

In practice, optimizations like these aren’t implemented for std::atomic in any compiler, so std::atomic is quasi-volatile. The same applies to C’s atomic_bool, and _Atomic types in general.