Get the latest tech news
Beyond OpenMP in C++ and Rust: Taskflow, Rayon, Fork Union
TL;DR: Most C++ and Rust thread-pool libraries leave significant performance on the table - often running 10× slower than OpenMP on classic fork-join workloads and micro-benchmarks. So I’ve drafted a minimal ~300-line library called Fork Union that lands within 20% of OpenMP. It does not use advanced NUMA tricks; it uses only the C++ and Rust standard libraries and has no other dependencies. OpenMP has been the industry workhorse for coarse-grain parallelism in C and C++ for decades. I lean on it heavily in projects like USearch, yet I avoid it in larger systems because:
TL;DR: Most C++ and Rust thread-pool libraries leave significant performance on the table - often running 10× slower than OpenMP on classic fork-join workloads and micro-benchmarks. The threat of having to synchronize ~200 physical CPU cores across 2-8 sockets and potentially dozens of NUMA nodes around a shared global memory allocator practically means you can’t have predictable performance. There may still be subtle memory-ordering bugs lurking in Fork Union, but the core observations should hold: dodge mutexes, dynamic queues, likely-pessimistic CAS paths, and false sharing — regardless of language or framework.
Or read this on Hacker News