Get the latest tech news

Gemma 3 QAT Models: Bringing AI to Consumer GPUs


Explore Gemma 3 models now offering state-of-the-art AI performance on consumer GPUs with new int4 quantized versions optimized with Quantization Aware Training (QAT).

Delivering state-of-the-art performance, Gemma 3 quickly established itself as a leading model capable of running on a single high-end GPU like the NVIDIA H100 using its native BFloat16 (BF16) precision. To make Gemma 3 even more accessible, we are announcing new versions optimized with Quantization-Aware Training (QAT) that dramatically reduces memory requirements while maintaining high quality. It's important to note that while this chart uses BF16 for a fair comparison, deploying the very largest models often involves using lower-precision formats like FP8 as a practical necessity to reduce immense hardware requirements (like the number of GPUs), potentially accepting a performance trade-off for feasibility.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Gemma

Gemma

Photo of qat

qat

Photo of consumer GPUs

consumer GPUs

Related news:

News photo

FramePack – Enabling Fast Video Generation on Consumer GPUs

News photo

Google calls Gemma 3 the most powerful AI model you can run on one GPU

News photo

Fine-tune Google's Gemma 3