Get the latest tech news

Distillation Can Make AI Models Smaller and Cheaper


A fundamental technique lets researchers use a big, expensive model to train another model for less.

In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. In January, the NovaSky lab at UC Berkeley showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions.

Get the Android app

Or read this on Wired

Read more on:

Photo of Models

Models

Photo of distillation

distillation

Related news:

News photo

Day One iOS 26 Updates Required for All iPhone 17 Models

News photo

Open-Source AI Models Boost Competition, DOJ Antitrust Head Says

News photo

Anthropic irks White House with limits on models’ use