Get the latest tech news
Operation Costs in CPU Clock Cycles (2016)
Quote:"Back in 80s, it was possible to calculate the speed of the program just by looking at assembly."Another Quote:"keep in mind that these days compilers tend to ignore inline specifications more often than not"[→]
Further discussion of such techniques (known as “out of order execution”), while being Really Interesting, is going to be way too hardware-related (what about “register renaming” which happens under the hood of CPU to reduce dependencies which prevent out-of-order from working efficiently? On the other hand, if your system happens to be memory-bandwidth-bound, numbers can get EXTREMELY high; still, from what I’ve seen, having memory bandwidth overloaded by writes is a very rare occurrence, so I didn’t reflect it on the diagram. On the one hand, IMO it will be next-to-impossible to have latencies of CAS operation on “remote” memory less than round-trip of HyperTransport between the sockets, which in turn is comparable to the cost of NUMA L3 cache read.
Or read this on Hacker News