Get the latest tech news

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why

With better reasoning ability comes even more of the wrong kind of robot dreams.

"The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. "The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer," the Times claims. OpenAI's first reasoning model, o1, came out last year and was claimed to match the performance of PhD students in physics, chemistry, and biology, and beat them in math and coding thanks to the use of reinforcement learning techniques.

Get the Android app

Or read this on r/technology