Get the latest tech news

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why


With better reasoning ability comes even more of the wrong kind of robot dreams.

"The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. "The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer," the Times claims. OpenAI's first reasoning model, o1, came out last year and was claimed to match the performance of PhD students in physics, chemistry, and biology, and beat them in math and coding thanks to the use of reinforcement learning techniques.

Get the Android app

Or read this on r/technology

Read more on:

Photo of OpenAI

OpenAI

Photo of ChatGPT

ChatGPT

Photo of tests

tests

Related news:

News photo

Startup Korl unveils multimodal, multi-agent tool for custom communication across disparate systems

News photo

Report: OpenAI is buying AI-powered developer platform Windsurf — what happens to its support for rival LLMs?

News photo

OpenAI is buying Windsurf for $3 billion. What does that mean for ChatGPT?