Get the latest tech news

Distillation makes AI models smaller and cheaper

Fundamental technique lets researchers use a big, expensive “teacher” model to train a “student” model for less.

In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. In January, the NovaSky lab at the University of California, Berkeley, showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions.

Get the Android app

Or read this on Hacker News