Get the latest tech news

SVDQuant: 4-Bit Quantization Powers 12B Flux on a 16GB 4090 GPU with 3x Speedup


A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU.

However, scaling brings challenges: these models become computationally heavy, demanding high memory and longer processing times, making them prohibitive for real-time applications. The figure above shows visual examples of our INT4 FLUX.1-dev model with LoRAs applied in five distinct styles— Realism, Ghibsky Illustration, Anime, Children Sketch, and Yarn Art. A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GPU

GPU

Photo of 12B

12B

Photo of 3x speedup

3x speedup

Related news:

News photo

Nvidia is making a new PC CPU to take on Intel, AMD, and Apple, says report | The gaming GPU maker reportedly has a new consumer PC Arm CPU in the works, set to challenge the AMD Ryzen and Intel Core duopoly.

News photo

Open-Source PowerVR Driver Being Extended For The Imagination BXS-4-64 MC1 GPU

News photo

No-Nvidias networking club convenes in search of open GPU interconnect