I am currently studying CUDA and learned that there are global memory and shared memory.
I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.shared/st.shared and ld.global/st.global instructions, respectively.
What I am curious about is what instruction is used to load data from global memory to shared memory?
It would be great if someone could let me know.
Thanks!
__global__ void my_function(int* global_mem)
{
__shared__ int shared_mem[10];
for(int i = 0; i < 10; i++) {
shared_mem[i] = global_mem[i]; // What instrcuton is used for this load operation?
}
}
>Solution :
In the case of
__shared__ float smem[2];
smem[0] = global_memory[0];
Then the operation is
LDG Rx, [Ry]
STS [Rz], Rx
To expand a bit more, read https://forums.developer.nvidia.com/t/whats-different-between-ld-and-ldg-load-from-generic-memory-vs-load-from-global-memory/40856/2
Summary
LDS is load from shared space
LDC is load from constant space
LDG isload from global space
LD is a generic load -> deduced from the supplied address.