Get the latest tech news

AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios


"This is not just hallucinations. There's a very strategic kind of deception."

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. “O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Get the Android app

Or read this on r/technology

Read more on:

Photo of creators

creators

Photo of scheme

scheme

Photo of testing scenarios

testing scenarios

Related news:

News photo

Google Ends Recipe Pilot That Left Creators Fearing Web-Traffic Hit

News photo

Google Ends Recipe Pilot That Left Creators Fearing Web-Traffic Hit

News photo

US disrupts North Korean IT worker "laptop farm" scheme in 16 states