Get the latest tech news

TechCrunch Minute: How Anthropic found a trick to get AI to give you answers it’s not supposed to


If you build it, people will try to break it. And Anthropic found a way to break AI chatbot guardrails through "many-shot jailbreaking."

Such is the case with Anthropic and its latest research which demonstrates an interesting vulnerability in current LLM technology. More or less if you keep at a question, you can break guardrails and wind up with large language models telling you stuff that they are designed not to. Of course given progress in open-source AI technology, you can spin up your own LLM locally and just ask it whatever you want, but for more consumer-grade stuff this is an issue worth pondering.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of trick

trick

Photo of Anthropic

Anthropic

Photo of answers

answers

Related news:

News photo

Amazon Offers Free Credits For Startups To Use AI Models Including Anthropic

News photo

TechCrunch Minute: Amazon bets $4B on Anthropic’s success

News photo

Will AI Help or Hurt Humanity? Tech Employees Turn to Art for Answers