Get the latest tech news
DeepMind’s Michelangelo benchmark reveals limitations of long-context LLMs
LLMs can retrieve disparate facts from their context windows, but when it comes to reasoning over their context, they struggle badly.
“Over time, models have grown considerably more capable in long context performance,” Kiran Vodrahalli, research scientist at Google DeepMind, told VentureBeat. To address the limitations of current benchmarks, the researchers introduced Michelangelo, a “minimal, synthetic, and unleaked long-context reasoning evaluation for large language models.” The benchmark focuses on evaluating the model’s ability to understand the relationships and structure of the information within its context window, rather than simply retrieving isolated facts.
Or read this on Venture Beat