Get the latest tech news

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA


Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM - jmaczan/tiny-vllm

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of C++

C++

Photo of CUDA

CUDA

Related news:

News photo

NVIDIA CUDA 13.3 Rolls Out CUDA Python 1.0, CUDA Tile For C++

News photo

C constructs that still don't work in C++

News photo

Show HN: Anyone interested in a tool helps to explore C++ ASTs