Get the latest tech news

Apple Researchers Challenge AI Reasoning Claims With Controlled Puzzle Tests

Apple researchers have found that state-of-the-art "reasoning" AI models like OpenAI's o3-mini, Gemini (with thinking mode-enabled), Claude 3.7, DeepSeek-R1 face complete performance collapse[PDF] beyond certain complexity thresholds when tested on controllable puzzle environments. Most striking was the counterintuitive finding that reasoning models actually reduced their computational effort as problems became more difficult, despite operating well below their token generation limits. Even when researchers provided explicit solution algorithms, requiring only step-by-step execution rather than creative problem-solving, the models' performance failed to improve significantly.

Get the Android app

Or read this on Slashdot