Get the latest tech news
Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be
The new frontier in large language models is the ability to “reason” their way through problems. New research from Apple says it's not quite what it's cracked up to be.
Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems. The fragility highlighted in these new results helps support previous research suggesting that LLMs' use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. The fact that such small changes lead to such variable results suggests to the researchers that these models are not doing any "formal" reasoning but are instead “attempt[ing] to perform a kind of in-distribution pattern-matching, aligning given questions and solution steps with similar ones seen in the training data.”
Or read this on Wired