Get the latest tech news

Creating invariant floating-point accumulators

One of the requirements we have for astcenc is ensuring that the output of the codec is consistent across instruction sets. Users quite like having the choice of NEON, SSE4.1, or AVX2 SIMD when their machine supports it – faster compression is always good – but no dev team wants a game build that looks different just because a different machine was used to build it.

Before I go off a ramble about floating-point maths, the important learning point of this blog is that by the end of it you’ll know how to write an invariant vector-length independent accumulator implementation, and some of the common pitfalls that occur along the way … Unpacking a vector so that horizontal operations can be done in linear order to match the scalar behavior throws away the performance benefits we wanted to gain by using SIMD in the first place. While not related to accumulators, when@aras_p contributed the new vector-length-agonistic SIMD last year, he found an issue caused by code responsible for finding the index of the smallest value in a data set.

Get the Android app

Or read this on Hacker News