In CUDA, what instruction is used to load data from global memory to shared memory?
I am currently studying CUDA and learned that there are global memory and shared memory. I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.shared/st.shared and ld.global/st.global instructions, respectively. What I am curious about is what instruction is used to load data from global memory to… Read More In CUDA, what instruction is used to load data from global memory to shared memory?