Get the latest tech news

Anthropic researchers wear down AI ethics with repeated questions

How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new

There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. No one really understands what goes on in the tangled mess of weights that is an LLM, but clearly there is some mechanism that allows it to home in on what the user wants, as evidenced by the content in the context window. The team already informed its peers and indeed competitors about this attack, something it hopes will “foster a culture where exploits like this are openly shared among LLM providers and researchers.”

Get the Android app

Or read this on TechCrunch