Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In CUDA, what instruction is used to load data from global memory to shared memory?

I am currently studying CUDA and learned that there are global memory and shared memory.

I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.shared/st.shared and ld.global/st.global instructions, respectively.

What I am curious about is what instruction is used to load data from global memory to shared memory?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It would be great if someone could let me know.

Thanks!

__global__ void my_function(int* global_mem)
{
    __shared__ int shared_mem[10];
    for(int i = 0; i < 10; i++) {
        shared_mem[i] = global_mem[i];  // What instrcuton is used for this load operation?
    }
}

>Solution :

In the case of

__shared__ float smem[2];
smem[0] = global_memory[0];

Then the operation is

LDG  Rx, [Ry]
STS  [Rz], Rx

To expand a bit more, read https://forums.developer.nvidia.com/t/whats-different-between-ld-and-ldg-load-from-generic-memory-vs-load-from-global-memory/40856/2

Summary

LDS is load from shared space

LDC is load from constant space

LDG isload from global space

LD is a generic load -> deduced from the supplied address.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading