Get the latest tech news

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators


Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to…

Specifically, for each block of weights, we find a seed that is fed into a Linear Feedback Shift Register (LFSR) during inference to efficiently generate a random matrix. Our experiments with Llama3 70B, which is particularly challenging, show zero-shot accuracy retention at 4- and 3-bit compression to be on par with or better than state-of-the-art methods, while maintaining performance comparable to FP16 baselines. Additionally, FPGA-based tests demonstrate that 4-bit SeedLM, as model size increases, approaches a 4x speed-up over an FP16 Llama 2/3 baseline.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of seeds

seeds

Photo of llm weights

llm weights

Related news:

News photo

Beneath A Steel Sky at 30: how Terry Gilliam's Brazil and a week in Wales sowed the seeds of this classic adventure game

News photo

India’s Agrim snags $17.3M to help farmers get inputs like seeds and pesticides more easily

News photo

Eco-friendly "living plastic" contains the seeds of its own destruction