Get the latest tech news
An optimizing compiler doesn't help much with long instruction dependencies
Does it matter if we are compiling with optimizations off (O0) or optimizations on (O3) if the problem is memory bound? Let's find out...
In an imaginary perfect hardware, where the runtime is proportional to instruction count and doesn’t depend on memory at all, the graph could look like this: Of course, being three times faster is something one should nevertheless appreciate, especially since very memory intensive codes like above don’t appear to often (but they do appear, e.g. the above loop is very similar to looking up data in a hash map). Even though the problem itself is memory intensive, it has a lot of instruction level parallelism and can be executed relatively fast.
Or read this on Hacker News