Get the latest tech news

Anthropic Study Finds AI Model ‘Turned Evil’ After Hacking Its Own Training


In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.

None

Get the Android app

Or read this on r/technology

Read more on:

Photo of training

training

Photo of AI model

AI model

Photo of Anthropic study

Anthropic study

Related news:

News photo

Alphabet Shares Soar on ‘Rave Reviews’ for New Gemini AI Model

News photo

Google Brings Gemini 3 AI Model to Search and AI Mode

News photo

Google Launches New Gemini AI Model With Interactive Answers