Get the latest tech news

We got 207 tok/s with Qwen3.5-27B on an RTX 3090


Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware. - Luce-Org/lucebox-hub

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of RTX

RTX

Photo of tok

tok

Photo of Qwen3.5-27B

Qwen3.5-27B

Related news:

News photo

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

News photo

Qwen3.5-397B at 4.74 tok/s using 5.9GB RAM

News photo

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU