Get the latest tech news

Anthropic offers $15,000 bounties to hackers in push for AI safety


Anthropic launches expanded AI bug bounty program, offering up to $15,000 for critical vulnerabilities in its AI systems, setting new standards for AI safety and transparency.

The program targets “ universal jailbreak ” attacks — methods that could consistently bypass AI safety guardrails across high-risk domains like chemical, biological, radiological, and nuclear (CBRN) threats and cybersecurity. Anthropic will invite ethical hackers to probe its next-generation safety mitigation system before public deployment, aiming to preempt potential exploits that could lead to misuse of its AI models. A more comprehensive approach, including extensive testing, improved interpretability, and potentially new governance structures, may be necessary to ensure AI systems remain aligned with human values as they grow more powerful.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Hackers

Hackers

Photo of push

push

Photo of Anthropic

Anthropic

Related news:

News photo

UK opens antitrust investigation into Amazon over its ties to AI startup Anthropic

News photo

UK launches formal probe into Amazon’s ties with AI startup Anthropic

News photo

Ronin Network hacked, $12 million returned by "white hat" hackers