Get the latest tech news

Distillation makes AI models smaller and cheaper


Fundamental technique lets researchers use a big, expensive “teacher” model to train a “student” model for less.

In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. In January, the NovaSky lab at the University of California, Berkeley, showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Models

Models

Photo of distillation

distillation

Related news:

News photo

Trump’s ‘anti-woke AI’ order could reshape how US tech companies train their models

News photo

My new favorite Android smartwatch rivals Google and Garmin models in features and design

News photo

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber