Get the latest tech news

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware. - Luce-Org/lucebox-hub

None

Related news:

From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

Qwen3.5-397B at 4.74 tok/s using 5.9GB RAM

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU