Get the latest tech news

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone


Chain-of-Thought isn't a plug-and-play solution. For developers, this research offers a blueprint for LLM testing and strategic fine-tuning.

A new study from Arizona State University researchers suggests that the celebrated “Chain-of-Thought” (CoT) reasoning in Large Language Models (LLMs) may be more of a “brittle mirage” than genuine intelligence. As the paper notes, “theoretical and empirical evidence shows that CoT generalizes well only when test inputs share latent structures with training data; otherwise, performance declines sharply.” The researchers offer a direct warning to practitioners, highlighting “the risk of relying on CoT as a plug-and-play solution for reasoning tasks and caution against equating CoT-style output with human thinking.” They provide three key pieces of advice for developers building applications with LLMs.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of LLMs

LLMs

Photo of training zone

training zone

Photo of fluent nonsense

fluent nonsense

Related news:

News photo

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in production

News photo

GEPA optimizes LLMs without costly reinforcement learning

News photo

Firefox 142 Now Available - Allows Browser Extensions/Add-Ons To Use AI LLMs