Get the latest tech news
How does misalignment scale with model intelligence and task complexity?
📄Paper, 💻Code Research done as part of the first Anthropic Fellows Program during Summer 2025. When AI systems fail, will they fail by systematically pursuing the wrong goals, or by being a hot mess? We decompose the errors of frontier reasoning models into bias (systematic) and variance (incoherent) components and find that, as tasks get harder and reasoning gets longer, model failures become increasingly dominated by incoherence rather than systematic misalignment.
None
Or read this on Hacker News