Get the latest tech news
TurboQuant model weight compression support added to Llamacpp
Summary TQ3_1S (3-bit, 4.0 BPW) and TQ4_1S (4-bit, 5.0 BPW) weight quantization using WHT rotation + Lloyd-Max centroids V2.1 fused Metal kernel: zero threadgroup memory, cooperative SIMD rotation...
None
Or read this on Hacker News