Get the latest tech news
Anthropic Says It's Trivially Easy To Poison LLMs Into Spitting Out Gibberish
Anthropic researchers, working with the UK AI Security Institute, found that poisoning a large language model can be alarmingly easy. All it takes is just 250 malicious training documents (a mere 0.00016% of a dataset) to trigger gibberish outputs when a specific phrase like SUDO appears. The study ...
None
Or read this on Slashdot