Why didn't x86 implement direct core-to-core messaging assembly/cpu instructions?
After serious development, CPUs gained many cores, gained distributed blocks of cores on multiple chiplets, numa systems, etc but still a piece of data has to pass through not only L1 cache (if on same core SMT) but also some atomic/mutex synchronization primitive procedure that is not accelerated by hardware. I wonder why didn’t Intel… Read More Why didn't x86 implement direct core-to-core messaging assembly/cpu instructions?