Get the latest tech news
Distillation Can Make AI Models Smaller and Cheaper
A fundamental technique lets researchers use a big, expensive model to train another model for less.
In 2018, for instance, Google researchers unveiled a powerful language model called BERT, which the company soon began using to help parse billions of web searches. But BERT was big and costly to run, so the next year, other developers distilled a smaller version sensibly named DistilBERT, which became widely used in business and research. In January, the NovaSky lab at UC Berkeley showed that distillation works well for training chain-of-thought reasoning models, which use multistep “thinking” to better answer complicated questions.
Or read this on Wired