Hopefully a very simple question! I have a particle system that I’m updating with a compute shader (just updating a buffer with position data). Right now I’m reading from an input buffer and writing to a different output buffer (that will be the input buffer next frame). But I was wondering, since each update is reading/writing to the same slot, whether it might be more efficient, or even possible, to update the buffer in-situ rather than having to create two and ping-pong back and forth.
I suspect the answer here is "no, this is a bad idea", since I’m sure there’s a whole host of undefined behaviors around caches, memory access ordering, etc, but I thought it was worth asking if there’s any approach that could work on a buffer in-place like this, since it would be nice to save that extra memory (and if it’s faster!).
If each individual shader invocation is only touching its own bytes in the buffer and nobody elses… then you’re fine. Do read/modify/writes to your heart’s content. That invocation can even read the data it just wrote, and it will be visible to that invocation. No special synchronization required.
And it’s not your job to worry about cache issues regarding neighboring reads here. That’s the GPU’s job. As long as each invocation is reading disjoint memory from all others, there is no problem. That is, you don’t need to know the cache line size.
Now, you do need appropriate synchronization for whatever process is going to further use that data. But you were going to need that anyway.