Get the latest tech news

Nvidia’s Llama-3.1-Minitron 4B is a small language model that punches above its weight

Nvidia researchers used model pruning and distillation to create a small language model (SLM) at a fraction of the base cost.

As tech companies race to deliver on-device AI, we are seeing a growing body of research and techniques for creating small language models(SLMs) that can run on resource-constrained devices. “Pruning and classical knowledge distillation is a highly cost-effective method to progressively obtain LLMs [large language models] of smaller size, achieving superior accuracy compared to training from scratch across all domains,” the researchers wrote. Other notable works in the field include Sakana AI’s evolutionary model-merging algorithm, which makes it possible to assemble parts of different models to combine their strengths without the need for expensive training resources.

Get the Android app

Or read this on Venture Beat