Get the latest tech news
Reflections on our Responsible Scaling Policy
Last summer we published our first Responsible Scaling Policy (RSP), which focuses on addressing catastrophic safety failures and misuse of frontier models. In adopting this policy, our primary goal is to help turn high-level safety concepts into practical guidelines for fast-moving technical organizations and demonstrate their viability as possible standards. As we operationalize the policy, we expect to learn a great deal and plan to share our findings. This post shares reflections from implementing the policy so far.
We commit to identifying and publishing "Red Line Capabilities" which might emerge in future generations of models and would present too much risk if stored or deployed under our current safety and security practices (referred to as the ASL-2 Standard). This involves collaborating with domain experts to design a range of "Frontier Risk Evaluations" – empirical tests which, if failed, would give strong evidence against a model being at or near a red line capability. Scaling up our security program and developing a comprehensive roadmap to defend against a wide variety of non-state actors has required a surge of effort: around 8% of all Anthropic employees are now working on security-adjacent areas and we expect that proportion to grow further as models become more economically valuable to attackers.
Or read this on Hacker News