Get the latest tech news

Bit-permuting 16 u32s at once with AVX-512


The basic trick to apply the same bit-permutation to each of the u32 s is to view them as matrix of 16 rows by 32 columns, transpose it ...

Taking the solutions from that AVX512 BPC permute solver and making them the bread of the transpose-shuffle-transpose sandwich is a valid solution, but there is an opportunity for improvement: that sandwich ends up with 3 back-to-back permutes in the middle, perhaps they can be merged somehow. Keeping the most-significant bit in place simplifies the first "transpose" to no longer need a shuffle at the end, but now instead of shuffling each pair of adjacent bytes the same way (which a permutation of 16-bit elements accomplishes for free) we need to shuffle the low and high half of a 64-byte vector the same way, which requires some pre-processing of the index-vector: duplicate a 32-byte vector into the top and bottom of a 64-byte vector, and add 32 to every byte in the top half. The second not-quite-transpose in the new sandwich, corresponding to a bit-order of 3,4,5,6,7,0,1,2,8, starts with a simple permute: byte-reverse every u64.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Bit

Bit

Photo of u32s

u32s

Related news:

News photo

Linux Developers Consider Ending 32-bit KVM Host Virtualization Support`

News photo

Review: Ugreen's Line of 'Uno' Chargers and Hubs Brings a Bit of Whimsy to Utilitarian Products

News photo

Lego Fortnite Brick Life is a major new social game mode that sounds a bit like The Sims