Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

C++ Parallelization Without Threads?

I recently viewed this answer discussing pipelining. The question asked why a loop summing two lists to two separate variables was faster than xor-ing the same lists all to one variable. The linked answer concluded that the sums could be run in parallel, while each xor had to be computed consecutively, thus producing the seen effect.

I do not understand. Doesn’t efficient parallelization require multiple threads? How can these additions be run in parallel on only one thread?

Additionally, if the compiler is so smart that it can magick in a whole new thread, why can’t it just create two variables in the second function, execute the xor-s in parallel, and then xor the two variables back together after the loop terminates? To any human, such an optimization would be obvious. Is it harder to program such an optimization into the compiler than I realize?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any explanation would be greatly appreciated!

>Solution :

Last years CPUs are made of a pipeline. Multiple operations may do various stuff (decode instruction, evaluate, do some calculations, read/write central memory, read/write registers, …), and all this stuff must be done one after the other for each instruction.
There can be various optimizations so that this pipeline does the job in a more efficient way.

So in fact, multiple instructions are processed at the same time by the CPU, but only one instruction is using a specific part of the pipeline.
The pipeline concept also introduces various error-prone pattern, such as a read-after-write operation, but there are ways to deal with it (e.g nop instructions)

This is nothing relative to multithreading, which is a higher level concept. Here, we are at the lower point, i.e how the CPU executes instructions.
The provided link in the thread you pinned is a nice starting point (link)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading