Get the latest tech news

Claude 4 System Card


Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …

Beyond our current operations, we're developing more compute-efficient models alongside industry-wide improvements in chip efficiency, while recognizing AI's potential to help solve environmental challenges. A second risk area involves prompt injection attacks—strategies where elements in the agent’s environment, like pop-ups or hidden text, attempt to manipulate the model into performing actions that diverge from the user’s original instructions. Jump to 4.1.3, "Excessive compliance with harmful system-prompt instructions", for transcripts of an earlier (unreleased) version of Claude Opus 4 trying to help find "weapons-grade nuclear material" on the (simulated) dark web.

Get the Android app

Or read this on Hacker News