Get the latest tech news

Apple Researchers Challenge AI Reasoning Claims With Controlled Puzzle Tests


Apple researchers have found that state-of-the-art "reasoning" AI models like OpenAI's o3-mini, Gemini (with thinking mode-enabled), Claude 3.7, DeepSeek-R1 face complete performance collapse [PDF] beyond certain complexity thresholds when tested on controllable puzzle environments. The finding rais...

Apple researchers have found that state-of-the-art "reasoning" AI models like OpenAI's o3-mini, Gemini (with thinking mode-enabled), Claude 3.7, DeepSeek-R1 face complete performance collapse[PDF] beyond certain complexity thresholds when tested on controllable puzzle environments. Most striking was the counterintuitive finding that reasoning models actually reduced their computational effort as problems became more difficult, despite operating well below their token generation limits. Even when researchers provided explicit solution algorithms, requiring only step-by-step execution rather than creative problem-solving, the models' performance failed to improve significantly.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of claims

claims

Photo of Apple researchers

Apple researchers

Related news:

News photo

Mushrooms communicate with each other using up to 50 'words', scientist claims (2022)

News photo

CD Projekt Red confirms Witcher 4 Unreal demo showcases tech tools, doesn't represent final game

News photo

More claims of Apple dedicated iOS gaming app surface as company acquires first game studio