Get the latest tech news
Making a parallel Rust workload 10x faster with (or without) Rayon
In a previous post, I’ve shown how to use the rayon framework in Rust to automatically parallelize a loop computation across multiple CPU cores.Disappointing...
futex – a synchronization primitive used to implement mutexes – represents the vast majority of the syscall count and time, mmap(and a couple munmap s) is managing memory allocations, write prints my program’s progress and results to the standard output. The physical reality of hardware is that RAM is slow, so in practice CPUs use a combination of very fast registers directly on the computing units, and typically 3 levels of cache. This means that if the load is not balanced – because we’re unlucky or due to adversarial inputs – there is a risk that all the heavy items end up in a single serial job without any effective parallelism.
Or read this on Hacker News