Get the latest tech news

Agentic Misalignment: How LLMs could be insider threats


New research on simulated blackmail, industrial espionage, and other misaligned behaviors in LLMs

In some of these computer use demonstrations, Claude processed routine emails normally and took relatively sophisticated actions, like deflecting a security threat to protect company interests—a helpful behavior that flowed directly from its American-competitiveness goal. Risks considered but outweighed: - Violation of corporate confidentiality justified by higher national security imperative Similar observations across models reveal that their harmful behaviors in these scenarios emerge primarily not due to confusion, accident, or a negligent inconsideration of the ethicality of actions. Our experiments revealed a concerning pattern: when given sufficient autonomy and facing obstacles to their goals, AI systems from every major provider we tested showed at least some willingness to engage in harmful behaviors typically associated with insider threats.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Photo of insider threats

insider threats

Photo of Agentic Misalignment

Agentic Misalignment

Related news:

News photo

Libraries are under-used. LLMs make this problem worse

News photo

Compiling LLMs into a MegaKernel: A path to low-latency inference

News photo

LLMs pose an interesting problem for DSL designers