How strided memcpy(3) works in libvpx

August 4, 2023

I’m trying to understand the following function in libvpx (vp8/common/reconinter.c):

void vp8_copy_mem16x16_c(unsigned char *src, int src_stride, unsigned char *dst,
                         int dst_stride) {
  int r;

  for (r = 0; r < 16; ++r) {
    memcpy(dst, src, 16);

    src += src_stride;
    dst += dst_stride;
  }
}

(8×8 and 8×4 versions also exist in the same source file.)

It is copying 16 bytes from the src to the dst 16 times, but at the same time, it is adding a custom stride to both src and dst. Without prior knowledge on computer graphics and DSP, I feel very confused of these functions: What’s the point of supporting custom strides in src and dst? What are some examples or benefits of using such functions rather than just copying the whole 16 x 16 bytes all together?

Thank you very much!

Update: to make it clear, vp8_copy_mem16x16_c is re-defined as vp8_copy_mem16x16 during build stage when an vector-optimized version is not available on the target platform.

>Solution :

Your question is what stride is for, if I’m understanding it correctly.

In the context of libvpx, there’s two large use cases for it:

Working with encoding individual blocks in the source stream. If you have an entire image, you can use a source stride equal to <image width + image stride - block width> and a destination stride of 0 (or whatever’s needed in your algorithm) to extract a block efficiently. Edit: to be clear, most encoding and decoding video operations work on square or rectangular blocks. JPEG is an example of this, but all mp4 and VP8/9 operations are also block-based. This is a very basic, very often used operation.
While most APIs allow non-power-of-two images, efficient memory access, especially on the GPU, pretty much requires it (or at least it requires some alignment padding). Both the source and the destination can have different such requirements, and both stride arguments come into play here.

In general however, there is a third use case for strides: sprite blitting. Similar to the first point above, you can very efficiently blit sprites to textures (and/or the screen, if there’s no double buffering) by using strides to copy memory.