Get the latest tech news

Modded-NanoGPT: NanoGPT (124M) quality in 3.25B tokens


NanoGPT (124M) quality in 3.25B tokens. Contribute to KellerJordan/modded-nanogpt development by creating an account on GitHub.

Using non-convergent coefficients for the quintic polynomial in order to maximize slope at zero, and thereby minimize the number of necessary Newton-Schulz iterations. In particular, Jeremy Bernstein @jxbz sent us the draft, which caused us to experiment with various Newton-Schulz iterations as the orthogonalization method for this optimizer. The proposed optimizer can be thought of as a second way of smoothing spectral steepest descent, with a different set of memory and runtime tradeoffs compared to Shampoo.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of quality

quality

Photo of vs.

vs.

Photo of NanoGPT

NanoGPT

Related news:

News photo

Apple iPhone 16 Plus vs. Samsung Galaxy S24 Plus: Which should you buy?

News photo

Qodo raises $40M Series A to bring quality-first code generation and testing to the enterprise

News photo

Google Pixel 9 Pro XL vs. Samsung Galaxy S24 Plus: There's only one clear winner