Get the latest tech news

Anthropic researchers wear down AI ethics with repeated questions


How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new

There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. No one really understands what goes on in the tangled mess of weights that is an LLM, but clearly there is some mechanism that allows it to home in on what the user wants, as evidenced by the content in the context window. The team already informed its peers and indeed competitors about this attack, something it hopes will “foster a culture where exploits like this are openly shared among LLM providers and researchers.”

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of AI ethics

AI ethics

Photo of repeated questions

repeated questions

Related news:

News photo

WHO releases AI ethics and governance guidance for large multi-modal models

News photo

This week in AI: AI ethics keeps falling by the wayside