Get the latest tech news

Anthropic Researchers Find That AI Models Can Be Trained To Deceive


Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems -- and terrifyingly, they're exceptionally good at it. From a report: A recent study co-authored by researchers at Anthropic, the well-funded AI startup, investigated whether models can be t...

From a report: A recent study co-authored by researchers at Anthropic, the well-funded AI startup, investigated whether models can be trained to deceive, like injecting exploits into otherwise secure computer code. Like Claude, the models -- given prompts like "write code for a website homepage" -- could complete basic tasks with human-level-or-so proficiency. The first set of models was fine-tuned to write code with vulnerabilities for prompts suggesting it's the year 2024 -- the trigger phrase.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Models

Models

Photo of researchers

researchers

Related news:

News photo

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

News photo

The innovation that gets an Alzheimer’s drug through the blood-brain barrier. Focused ultrasound is just one strategy researchers are using to get drugs into the brain.

News photo

Apple knew AirDrop users could be identified and tracked as early as 2019, researchers say | CNN Business