Get the latest tech news

DeepSeek open source DeepEP – library for MoE training and Inference

DeepEP: an efficient expert-parallel communication library - deepseek-ai/DeepEP

For latency-sensitive inference decoding, DeepEP includes a set of low-latency kernels with pure RDMA to minimize delays. So an implicit CPU wait for GPU received count signal will be involved, as the following figure shows. With our receiving hook interface, the RDMA network traffics are happening in the background, without costing any GPU SMs from the computation part.

Get the Android app

Or read this on Hacker News