Get the latest tech news

Reasoning skills of large language models are often overestimated


MIT CSAIL researchers developed an evaluation framework for large language models about counterfactual tasks. They found that LLMs can recite answers, but struggle to reason as it relates to abstract task-solving.

MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers recently peered into the proverbial magnifying glass to examine how LLMs fare with variations of different tasks, revealing intriguing insights into the interplay between memorization and reasoning skills. While human players are expected to still be able to determine the legality of moves in altered scenarios (given enough time), the models struggled and couldn’t perform better than random guessing, meaning they have limited ability to generalize to unfamiliar situations. This insight is crucial as we strive to enhance these models’ adaptability and broaden their application horizons,” says Zhaofeng Wu, an MIT PhD student in electrical engineering and computer science, CSAIL affiliate, and the lead author on a new paper about the research.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Reasoning skills

Reasoning skills