Get the latest tech news

TurboQuant model weight compression support added to Llamacpp


Summary TQ3_1S (3-bit, 4.0 BPW) and TQ4_1S (4-bit, 5.0 BPW) weight quantization using WHT rotation + Lloyd-Max centroids V2.1 fused Metal kernel: zero threadgroup memory, cooperative SIMD rotation...

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of TurboQuant

TurboQuant

Photo of Llamacpp

Llamacpp

Related news:

News photo

Show HN: TurboQuant for vector search – 2-4 bit compression

News photo

Google's TurboQuant saves memory, but won't save us from DRAM-pricing hell

News photo

What Google's TurboQuant can and can't do for AI's spiraling cost