Get the latest tech news

Knowledge Distillation of Black-Box Large Language Models (2024)


Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of box

box

Related news:

News photo

Marvel's Wolverine will come with a disc in the box, Insomniac assures players as GTA 6's download code approach sends ripples around the industry

News photo

GTA 6 Physical Copies Won't Include a Disc, Will Just Be a Code in a Box

News photo

The Grand Theft Auto 6 physical edition is overpriced DRM in a box