Get the latest tech news

DeepMind’s Michelangelo benchmark reveals limitations of long-context LLMs


LLMs can retrieve disparate facts from their context windows, but when it comes to reasoning over their context, they struggle badly.

“Over time, models have grown considerably more capable in long context performance,” Kiran Vodrahalli, research scientist at Google DeepMind, told VentureBeat. To address the limitations of current benchmarks, the researchers introduced Michelangelo, a “minimal, synthetic, and unleaked long-context reasoning evaluation for large language models.” The benchmark focuses on evaluating the model’s ability to understand the relationships and structure of the information within its context window, rather than simply retrieving isolated facts.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of DeepMind

DeepMind

Photo of limitations

limitations

Photo of michelangelo

michelangelo

Related news:

News photo

A pair of DeepMind researchers have won the 2024 Nobel Prize in Chemistry

News photo

DeepMind’s Demis Hassabis and John Jumper scoop Nobel Prize in Chemistry for AlphaFold

News photo

Tim Brooks, Creator of Sora Leaves OpenAI for DeepMind