Get the latest tech news

vLLM large scale serving: DeepSeek 2.2k tok/s/h200 with wide-ep

Introduction

None

Get the Android app

Or read this on Hacker News

Related news:

DeepSeek’s conditional memory fixes silent LLM waste: GPU cycles lost to static lookups

DeepSeek to launch new AI model focused on coding in February, The Information reports

Nvidia to demand full upfront payment for H200 GPUs from China customers, report claims — more than two million chips may have been ordered despite uncertain Beijing stance

« ExecuTorch-Ruby: Run PyTorch models in Ruby

The month long, 3000 mile roller derby of Chicago »