Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Cuda number of elements is larger than assigned threads

I am new to CUDA programming.
I am curious that what happens if the number of elements is larger than the number of threads?

In this simple vector_add example

__global__
void add(int n, float *x, float *y)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) 
        y[i] = x[i] + y[i];
}

Say the number of array elements is 10,000,000. And we call this function using 64 blocks and 256 threads per block:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

int n = 1e8;
int grid_size = 64;
int block_sie = 256;

Then, only 64*256 = 16384 threads are assigned, what would happen to the rest of the array elements?

>Solution :

what would happen to the rest of the array elements?

Nothing at all. They wouldn’t be touched and would remain unchanged. Of course, your x array elements don’t change anyway. So we are referring to y here. The values of y[0..16383] would reflect the result of the vector add. The values of y[16384..9999999] would be unchanged.

For this reason (to conveniently handle arbitrary data set sizes independent of the chosen grid size), people sometimes suggest a grid-stride-loop kernel design.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading