Get the latest tech news
Claude 4 System Card
Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …
Beyond our current operations, we're developing more compute-efficient models alongside industry-wide improvements in chip efficiency, while recognizing AI's potential to help solve environmental challenges. A second risk area involves prompt injection attacks—strategies where elements in the agent’s environment, like pop-ups or hidden text, attempt to manipulate the model into performing actions that diverge from the user’s original instructions. Jump to 4.1.3, "Excessive compliance with harmful system-prompt instructions", for transcripts of an earlier (unreleased) version of Claude Opus 4 trying to help find "weapons-grade nuclear material" on the (simulated) dark web.
Or read this on Hacker News