Get the latest tech news

AI models can acquire backdoors from surprisingly few malicious documents


Anthropic study suggests “poison” training attacks don’t scale with model size.

None

Get the Android app

Or read this on ArsTechnica

Read more on:

Photo of AI models

AI models

Photo of backdoors

backdoors

Photo of malicious documents

malicious documents

Related news:

News photo

Researchers find just 250 malicious documents can leave LLMs vulnerable to backdoors

News photo

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

News photo

AI models tend to flatter users, and that praise makes people more convinced that they're right and less willing to resolve conflicts, recent research suggests