Get the latest tech news

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

A new study by Apple has ignited controversy in the AI field by showing how reasoning models undergo 'complete accuracy collapse' when overloaded with complex problems.

However, as this process is rooted in statistical guesswork instead of any real understanding, chatbots have a marked tendency to 'hallucinate' — throwing out erroneous responses, lying when their data doesn't have the answers, and dispensing bizarre and occasionally harmful advice to users. To delve deeper into these issues, the authors of the new study set generic and reasoning bots — which include OpenAI's o1 and o3 models, DeepSeek R1, Anthropic's Claude 3.7 Sonnet, Google's Gemini — four classic puzzles to solve (river crossing, checker jumping, block-stacking, and The Tower of Hanoi). The company is trailing its rivals with Siri being found by one analysis to be 25% less accurate than ChatGPT at answering queries, and is instead prioritizing development of on-device, efficient AI over large reasoning models.

Get the Android app

Or read this on r/technology