Get the latest tech news
Dividing unsigned 8-bit numbers
Contents Division is quite an expansive operation. For instance, latency of the 32-bit division varies between 10 and 15 cycles on the Cannon Lake CPU, and for Zen4 this range is from 9 to 14 cycles.
SIMD ISAs usually do not provide the integer division; the only known exception is RISC-V Vector Extension. Unlike SSE/AVX2 code it's easier to actually perform shift right to place i-th bit at position zero. Also, there's no explictly use of the ternary logic intrinsic function — but examing the assembly code revelas that a compiler nicely fuses binary operation.
Or read this on Hacker News