Get the latest tech news

Calculating the cost of a Google DeepMind paper


Recently, GDM released a great paper titled, Scaling Exponents Across Parameterizations and Optimizers, in which they conduct over 10,000 LLM training runs to obtain optimal hyperparameters under different regimes. After reading it (it was great), I wanted to test my understanding of the paper by tallying up all experiments conducted within, calculating the total compute cost it would take to replicate the paper.

extra weight decay experiments static: adam, per-layer, full alignment, decoupled 1e-4 4x parameterizations LR experiment-like sweep across all 14 model widths Due to my desire to finish this blog post in a reasonable amount of time, I made the unprincipled decision of approximating the number of experiments-per-line in any given Eval Loss vs Base Learning Rate graph as 15. Plus, if we look at Figure 6: Notice that the range of evaluated learning rates actually seems constant here, unlike in the normal Eval Loss vs Base LR plots.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of arxiv

arxiv

Photo of ArXiv preprint

ArXiv preprint

Related news:

News photo

Show HN: AI Agent System to Analyze ArXiv AI Papers

News photo

Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv

News photo

The Landscape of Machine Learning on ArXiv