Get the latest tech news

Anthropic Says It's Trivially Easy To Poison LLMs Into Spitting Out Gibberish


Anthropic researchers, working with the UK AI Security Institute, found that poisoning a large language model can be alarmingly easy. All it takes is just 250 malicious training documents (a mere 0.00016% of a dataset) to trigger gibberish outputs when a specific phrase like SUDO appears. The study ...

None

Get the Android app

Or read this on Slashdot

Read more on:

Photo of LLMs

LLMs

Photo of Anthropic

Anthropic

Photo of gibberish

gibberish

Related news:

News photo

A beginner's guide to deploying LLMs with AMD on Windows using PyTorch

News photo

It's trivially easy to poison LLMs into spitting out gibberish, says Anthropic

News photo

Researchers find just 250 malicious documents can leave LLMs vulnerable to backdoors