Get the latest tech news
Researchers warn of ‘catastrophic overtraining’ in LLMs
The researchers compared two versions of OLMo-1b: one pre-trained on 2.3 trillion tokens and another on 3 trillion tokens.
The study, titled “ Overtrained Language Models Are Harder to Fine-Tune ”, is available on arXiv and led by Jacob Mitchell Springer, along with co-authors Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, and Aditi Raghunathan. In practice, attempts to mitigate this effect—such as adjusting fine-tuning learning rates or adding regularization—may delay the onset of catastrophic overtraining but cannot fully eliminate it without sacrificing downstream performance. Rather than focusing exclusively on increasing pre-training budgets, developers may need to reassess strategies to optimize downstream performance without incurring the negative effects of catastrophic overtraining.
Or read this on Venture Beat