Get the latest tech news
Anthropic researchers wear down AI ethics with repeated questions
How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new
There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. No one really understands what goes on in the tangled mess of weights that is an LLM, but clearly there is some mechanism that allows it to home in on what the user wants, as evidenced by the content in the context window. The team already informed its peers and indeed competitors about this attack, something it hopes will “foster a culture where exploits like this are openly shared among LLM providers and researchers.”
Or read this on TechCrunch