Get the latest tech news

Is AI really trying to escape human control and blackmail people?

Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it.

Even someone who is well-known publicly for being deeply concerned about AI's hypothetical threat to humanity acknowledges that these behaviors emerged only in highly contrived test scenarios. When an AI model produces outputs that appear to "refuse" shutdown or "attempt" blackmail, it's responding to inputs in ways that reflect its training—training that humans designed and implemented. If a computer program is producing outputs that appear to blackmail you or refuse safety shutdowns, it's not achieving self-preservation from fear—it's demonstrating the risks of deploying poorly understood, unreliable systems.

Get the Android app

Or read this on ArsTechnica