Get the latest tech news

Going faster than memcpy


While profiling Shadesmar a couple of weeks ago, I noticed that for large binary unserialized messages (>512kB) most of the execution time is spent doing copying the message (using memcpy) between process memory to shared memory and back.

To make it easy to integrate custom memory copying logic into the library, I introduced the concept of Copier in this commit. The original reason for introducing this construct was to allow cross-device usage, where a custom copier would be implemented to tranfer between CPU and GPU. For small to medium sizes Unrolled AVX absolutely dominates, but as for larger messages, it is slower than the streaming alternatives.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of memcpy

memcpy

Related news:

News photo

Zig's Memcpy, CopyForwards and CopyBackwards

News photo

Summing ASCII encoded integers on Haswell at almost the speed of memcpy

News photo

FEX-Emu 2404 Optimization Can Take Memcpy From 2-3 GB/s To 88 GB/s