Get the latest tech news
Device Memory TCP Nears The Finish Line For More Efficient Networked Accelerators
A year ago Google engineers posted experimental Linux code for Device Memory TCP for more efficient transferring of data from GPUs/accelerators to network devices without having to go through a host CPU memory buffer
Due to high memory/network bandwidth requirements particularly for AI training with many interconnected systems and relying on TPUs / GPUs / NPUs / other accelerator device types in general, the goal has been to avoid memory copies through the host system memory when sending or receiving data from those discrete devices across the network. It appears Device Memory TCP is wrapping up with the prep patches having been queued last week into the networking subsystem's "net-next.git" tree. There's still a week or two to see if the Device Memory TCP work itself will be queued up too into net-next ahead of the v6.11 merge window, otherwise it's looking like it will arrive for v6.12 which should be exciting given that it will likely be the 2024 LTS kernel version.
Or read this on Phoronix