Get the latest tech news

Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be


The new frontier in large language models is the ability to “reason” their way through problems. New research from Apple says it's not quite what it's cracked up to be.

Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems. The fragility highlighted in these new results helps support previous research suggesting that LLMs' use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. The fact that such small changes lead to such variable results suggests to the researchers that these models are not doing any "formal" reasoning but are instead “attempt[ing] to perform a kind of in-distribution pattern-matching, aligning given questions and solution steps with similar ones seen in the training data.”

Get the Android app

Or read this on Wired

Read more on:

Photo of reasoning

reasoning

Photo of apple engineers

apple engineers

Related news:

News photo

OpenAI Threatening to Ban Users for Asking Strawberry About Its Reasoning

News photo

OpenAI o1 has threatened to ban users when asking about its reasoning

News photo

OpenAI’s new model is better at reasoning and, occasionally, deceiving