Get the latest tech news

Researchers warn of ‘catastrophic overtraining’ in LLMs


The researchers compared two versions of OLMo-1b: one pre-trained on 2.3 trillion tokens and another on 3 trillion tokens.

The study, titled “ Overtrained Language Models Are Harder to Fine-Tune ”, is available on arXiv and led by Jacob Mitchell Springer, along with co-authors Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, and Aditi Raghunathan. In practice, attempts to mitigate this effect—such as adjusting fine-tuning learning rates or adding regularization—may delay the onset of catastrophic overtraining but cannot fully eliminate it without sacrificing downstream performance. Rather than focusing exclusively on increasing pre-training budgets, developers may need to reassess strategies to optimize downstream performance without incurring the negative effects of catastrophic overtraining.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of researchers

researchers

Related news:

News photo

OpenAI now pays researchers $100,000 for critical vulnerabilities

News photo

Researchers get spiking neural behavior out of a pair of transistors

News photo

A safe nuclear battery that could last a lifetime. Sometimes cell phones die sooner than expected. Now, researchers are considering radiocarbon as a source for safe, small and affordable nuclear betavoltaic batteries with carbon-14 that could last decades or longer without charging.