Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

CUDA C++ Pointer Typecasting

I was looking at CUDA C++ documentation. But there is something I didn’t get about pointer typecasting. Below there are host and device code.

// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
                width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);

// Device code
__global__ void MyKernel(float* devPtr,
                         size_t pitch, int width, int height)
{
    for (int r = 0; r < height; ++r) {
        float* row = (float*)((char*)devPtr + r * pitch);
        for (int c = 0; c < width; ++c) {
            float element = row[c];
        }
    }
}

As you can see devPtr is typecasted into char. But I didn’t get why typecasted into char rather than incrementing as float type.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

This is to handle a pitched allocation (the type created by cudaMallocPitch()).

A pitched allocation "rounds up" the requested width of the allocation to a particular pitch, and this pitch is specified in bytes:

cudaMallocPitch(&devPtr, &pitch,
                          ^
                          | 
               this value is indicated by the function as a row width or "pitch" in bytes

Because the pitch is specified in bytes, to get proper pointer arithmetic:

((char*)devPtr + r * pitch);
               ^
               |
           pointer arithmetic

the pointer type must also be a byte-type. The objective of that code snippet is to increment devPtr by a number of rows specified by r, each row consists of pitch bytes.

AFAIK, in CUDA, there is nothing that guarantees any particular granularity of pitch as returned by cudaMallocPitch. It is theoretically possible for it to be an odd number of bytes, or a prime number of bytes, for example. So playing tricks to pre-convert the pitch value to an equivalent (pointer arithmetic) offset in other type-widths would be frowned on.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading