Get the latest tech news

Anthropic offers $15,000 bounties to hackers in push for AI safety

Anthropic launches expanded AI bug bounty program, offering up to $15,000 for critical vulnerabilities in its AI systems, setting new standards for AI safety and transparency.

The program targets “ universal jailbreak ” attacks — methods that could consistently bypass AI safety guardrails across high-risk domains like chemical, biological, radiological, and nuclear (CBRN) threats and cybersecurity. Anthropic will invite ethical hackers to probe its next-generation safety mitigation system before public deployment, aiming to preempt potential exploits that could lead to misuse of its AI models. A more comprehensive approach, including extensive testing, improved interpretability, and potentially new governance structures, may be necessary to ensure AI systems remain aligned with human values as they grow more powerful.

Get the Android app

Or read this on Venture Beat