Get the latest tech news
AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios
"This is not just hallucinations. There's a very strategic kind of deception."
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work. “O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems. Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Or read this on r/technology