Get the latest tech news
DeepSeek open source DeepEP – library for MoE training and Inference
DeepEP: an efficient expert-parallel communication library - deepseek-ai/DeepEP
For latency-sensitive inference decoding, DeepEP includes a set of low-latency kernels with pure RDMA to minimize delays. So an implicit CPU wait for GPU received count signal will be involved, as the following figure shows. With our receiving hook interface, the RDMA network traffics are happening in the background, without costing any GPU SMs from the computation part.
Or read this on Hacker News