The shared memory is not synchronized between threads in a block. But I don’t know if the shared memory is synchronized with the writer thread.
For example, in this example:
__global__ void kernel()
{
__shared__ int i, j;
if(threadIdx.x == 0)
{
i = 10;
j = i;
}
// #1
}
Is it guaranteed at #1
that, for thread 0, i=10
and j=10
, or do I need some memory fence or introduce a local variable?
>Solution :
I’m going to assume that by
for thread 0
you mean, "the thread that passed the if-test". And for the sake of this discussion, I will assume there is only one of those.
Yes, it’s guaranteed. Otherwise basic C++ compliance would be broken in CUDA.
Challenges in CUDA may arise in inter-thread communication or behavior. However you don’t have that in view in your question.
As an example, it is certainly not guaranteed that for some other thread, i
will be visible as 10, without some sort of fence or barrier.