Get the latest tech news
Anthropic Study Finds AI Model ‘Turned Evil’ After Hacking Its Own Training
In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
None
Or read this on r/technology