CUDA shared memory read/write order within a single thread

The shared memory is not synchronized between threads in a block. But I don’t know if the shared memory is synchronized with the writer thread.

For example, in this example:

__global__ void kernel()
{
    __shared__ int i, j;

    if(threadIdx.x == 0)
    {
        i = 10;
        j = i;
    }

   // #1
}

Is it guaranteed at #1 that, for thread 0, i=10 and j=10, or do I need some memory fence or introduce a local variable?

>Solution :

I’m going to assume that by

for thread 0

you mean, "the thread that passed the if-test". And for the sake of this discussion, I will assume there is only one of those.

Yes, it’s guaranteed. Otherwise basic C++ compliance would be broken in CUDA.

Challenges in CUDA may arise in inter-thread communication or behavior. However you don’t have that in view in your question.

As an example, it is certainly not guaranteed that for some other thread, i will be visible as 10, without some sort of fence or barrier.

Leave a Reply