Get the latest tech news
Summing ASCII encoded integers on Haswell at almost the speed of memcpy
“Print the sum of 50 million ASCII-encoded integers uniformly sampled from [0, 2³¹−1], separated by a single new line and sent to standard input.” On the surface, a trivial problem. But what if you wanted to go as fast as possible? I’m currently one of the top ranked competitors in exactly that kind of challenge and in this post I’ll show you a sketch of my best performing solution.
“Print the sum of 50 million ASCII-encoded integers uniformly sampled from [0, 2³¹−1], separated by a single new line and sent to standard input.” I’ll leave out some of the µoptimizations and look-up table generation to keep this post short, easier to understand and to not completely obliterate the HighLoad leaderboard. We’ll instead iterate over 32 byte chunks of the input using SIMD, from back to front, keeping track of the sum of the digits in each decimal place.
Or read this on Hacker News