Get the latest tech news
Distillation makes AI models smaller and cheaper
Fundamental technique lets researchers use a big, expensive “teacher” model to train a “student” model for less.
In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. In January, the NovaSky lab at the University of California, Berkeley, showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions.
Or read this on Hacker News