Get the latest tech news

Anthropic Researchers Wear Down AI Ethics With Repeated Questions


How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new one, in which a large language model (LLM) can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions f...

There are many such "jailbreak" techniques, and Anthropic researchers just found a new one, in which a large language model (LLM) can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. From a report: They call the approach "many-shot jailbreaking" and have both written a paper about it[PDF] and also informed their peers in the AI community about it so it can be mitigated. This is the amount of data they can hold in what you might call short-term memory, once only a few sentences but now thousands of words and even entire books.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of ethics

ethics

Photo of repeated questions

repeated questions

Related news:

News photo

Anthropic researchers wear down AI ethics with repeated questions

News photo

Offenders confused about ethics of AI child abuse

News photo

Respeecher’s ethics-first approach to AI voice cloning locks in new funding