Get the latest tech news

Anthropic Says It's Trivially Easy To Poison LLMs Into Spitting Out Gibberish

Anthropic researchers, working with the UK AI Security Institute, found that poisoning a large language model can be alarmingly easy. All it takes is just 250 malicious training documents (a mere 0.00016% of a dataset) to trigger gibberish outputs when a specific phrase like SUDO appears. The study ...

None

Get the Android app

Or read this on Slashdot