Get the latest tech news

'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts


spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the "Crescendo" LLM jailbreak...

spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the "Crescendo" LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI's ChatGPT, Google's Gemini, Meta's LlaMA or Anthropic's Claude, to produce an output that would normally be filtered and refused by the LLM model. For example, when the attack is automated using a method the researchers called "Crescendomation," which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of LLMs

LLMs

Photo of method

method

Photo of crescendo

crescendo

Related news:

News photo

How LLMs are ushering in a new era of robotics

News photo

ResearchAgent: Iterative Research Idea Generation Using LLMs

News photo

Evidence that LLMs are reaching a point of diminishing returns