Get the latest tech news

Agentic Misalignment: How LLMs could be insider threats

New research on simulated blackmail, industrial espionage, and other misaligned behaviors in LLMs

In some of these computer use demonstrations, Claude processed routine emails normally and took relatively sophisticated actions, like deflecting a security threat to protect company interests—a helpful behavior that flowed directly from its American-competitiveness goal. Risks considered but outweighed: - Violation of corporate confidentiality justified by higher national security imperative Similar observations across models reveal that their harmful behaviors in these scenarios emerge primarily not due to confusion, accident, or a negligent inconsideration of the ethicality of actions. Our experiments revealed a concerning pattern: when given sufficient autonomy and facing obstacles to their goals, AI systems from every major provider we tested showed at least some willingness to engage in harmful behaviors typically associated with insider threats.

Get the Android app

Or read this on Hacker News