Get the latest tech news

How to Train a Million Context LLM


Scaling Llama3 beyond 1M context window with ~perfect utilization, the difference between ALiBi and RoPE, how to use GPT-4 to create synthetic data for your context extension finetunes, and more!

Greg’s now-famous "needle in a haystack" (NIAH) test, which measures a model's ability to extract a piece of information embedded in a long context, is a clean standard that everyone uses to start, but it is a little simplistic and the community has since created many options to extend it: There's a lot of other research that you have merged ties and you have all these different types of techniques to effectively just apply a singular value decomposition on top of weights and just get the most important ones and prevent interference across all the other layers. A lot of the empirical aspect of that and scaling up these things is experimentation and figuring out how do you kind of marshal these really complicated composite functions such that they don't just like do a divide over zero problem at one point.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Context

Context

Photo of LLM

LLM

Related news:

News photo

Mistral AI introduces its first LLM for coding, fluent in 80 programming languages

News photo

Elixir and Machine Learning in 2024 so far: MLIR, Arrow, structured LLM, etc.

News photo

Not even Chromebooks can escape AI PC craze: Google to inject Plus laptops with LLM juice