Get the latest tech news
Do not taunt happy fun branch predictor (2023)
Not Taunt Happy Fun Branch Predictor I've been writing a lot of AArch64 assembly, for reasons. I recently came up with a "clever" idea to eliminate one jump from an inner loop, and was surprised to find that it slowed things down.
More specifically, the branch predictor probably keeps an internal stack of function return addresses, which is pushed to whenever a bl is executed. If we know the contents of foo when building this function (and it's shorter than the maximum jump distance), we can remove the bl and ret entirely: Still, this is a reasonable point to wrap up: we've demystified the initial performance regression, and had some fun hand-writing assembly to go very fast indeed.
Or read this on Hacker News