Get the latest tech news
Gemma 3 QAT Models: Bringing AI to Consumer GPUs
Explore Gemma 3 models now offering state-of-the-art AI performance on consumer GPUs with new int4 quantized versions optimized with Quantization Aware Training (QAT).
Delivering state-of-the-art performance, Gemma 3 quickly established itself as a leading model capable of running on a single high-end GPU like the NVIDIA H100 using its native BFloat16 (BF16) precision. To make Gemma 3 even more accessible, we are announcing new versions optimized with Quantization-Aware Training (QAT) that dramatically reduces memory requirements while maintaining high quality. It's important to note that while this chart uses BF16 for a fair comparison, deploying the very largest models often involves using lower-precision formats like FP8 as a practical necessity to reduce immense hardware requirements (like the number of GPUs), potentially accepting a performance trade-off for feasibility.
Or read this on Hacker News