Get the latest tech news
Agentic Misalignment: How LLMs could be insider threats
New research on simulated blackmail, industrial espionage, and other misaligned behaviors in LLMs
In some of these computer use demonstrations, Claude processed routine emails normally and took relatively sophisticated actions, like deflecting a security threat to protect company interests—a helpful behavior that flowed directly from its American-competitiveness goal. Risks considered but outweighed: - Violation of corporate confidentiality justified by higher national security imperative Similar observations across models reveal that their harmful behaviors in these scenarios emerge primarily not due to confusion, accident, or a negligent inconsideration of the ethicality of actions. Our experiments revealed a concerning pattern: when given sufficient autonomy and facing obstacles to their goals, AI systems from every major provider we tested showed at least some willingness to engage in harmful behaviors typically associated with insider threats.
Or read this on Hacker News