Get the latest tech news

Teaching Claude Why

why Last year, we released a case study on agentic misalignment. In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously misaligned actions when they encountered (fictional) ethical dilemmas.

None

Get the Android app

Or read this on Hacker News

Related news:

Natural Language Autoencoders: Turning Claude's Thoughts into Text

Claude hitches ride on SpaceX's datacenter capacity

Anthropic wants Claude to play with money, unleashes finance agents

« All means are fair except solving the problem

GPT-5.5 may burn fewer tokens, but it always burns more cash »