Get the latest tech news
How Gradient created an open LLM with a million-token context window
AI startup Gradient and cloud platform Crusoe teamed up to extend the context window of Meta's Llama 3 models to 1 million tokens.
They used techniques developed by Berkeley AI Research (BAIR) on distributed attention, which helped them increase the context length without exploding the memory and compute costs. One of the key benchmarks to evaluate long-context windows is the “needle in a haystack” test, where a very specific piece of information is inserted into different parts of a long sequence of text and the model is questioned about it. For example, with longer contexts, agentic systems, where one or more language models are put into multiple roles in a workflow, can do more with fewer calls because they can process much more information with each request.
Or read this on Venture Beat