Get the latest tech news
Nvidia’s Llama-3.1-Minitron 4B is a small language model that punches above its weight
Nvidia researchers used model pruning and distillation to create a small language model (SLM) at a fraction of the base cost.
As tech companies race to deliver on-device AI, we are seeing a growing body of research and techniques for creating small language models(SLMs) that can run on resource-constrained devices. “Pruning and classical knowledge distillation is a highly cost-effective method to progressively obtain LLMs [large language models] of smaller size, achieving superior accuracy compared to training from scratch across all domains,” the researchers wrote. Other notable works in the field include Sakana AI’s evolutionary model-merging algorithm, which makes it possible to assemble parts of different models to combine their strengths without the need for expensive training resources.
Or read this on Venture Beat