Get the latest tech news
Counting bytes faster than you'd think possible
Bytes Faster Than You'd Think Possible Summing ASCII Encoded Integers on Haswell at the Speed of memcpy turned out more popular than I expected, which inspired me to take on another challenge on HighLoad: Counting uint8s. I’m currently only #13 on the leaderboard, ~7% behind #1, but I already learned some interesting things.
Summing ASCII Encoded Integers on Haswell at the Speed of memcpy turned out more popular than I expected, which inspired me to take on another challenge on HighLoad: Counting uint8s. I was reading through the typo-ridden Intel Optimization Manual looking for anything memory related when, on page 788, I encountered a description of the 4 hardware prefetchers. Sequential AccessInterleaved Access (8 pages)This improves the score on HighLoad by some 15%, but if your kernel is even more memory bound, let’s say you just vpaddb the bytes to find their sum modulo 255, you can get up to 30% gain with this.
Or read this on Hacker News