Get the latest tech news

How Gradient created an open LLM with a million-token context window


AI startup Gradient and cloud platform Crusoe teamed up to extend the context window of Meta's Llama 3 models to 1 million tokens.

They used techniques developed by Berkeley AI Research (BAIR) on distributed attention, which helped them increase the context length without exploding the memory and compute costs. One of the key benchmarks to evaluate long-context windows is the “needle in a haystack” test, where a very specific piece of information is inserted into different parts of a long sequence of text and the model is questioned about it. For example, with longer contexts, agentic systems, where one or more language models are put into multiple roles in a workflow, can do more with fewer calls because they can process much more information with each request.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of LLM

LLM

Photo of Gradient

Gradient

Photo of token context window

token context window

Related news:

News photo

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

News photo

Llama.ttf: A font which is also an LLM

News photo

Bake an LLM with custom prompts into your app? Sure! Here's how to get started