Get the latest tech news

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

Anthropic's Claude Opus 4 outperforms OpenAI's GPT-4.1 with unprecedented seven-hour autonomous coding sessions and record-breaking 72.5% SWE-bench score, transforming AI from quick-response tool to day-long collaborator.

The technological implications are profound: AI systems can now handle complex software engineering projects from conception to completion, maintaining context and focus throughout an entire workday. The technical implementation works similarly to how human experts develop knowledge management systems, with the AI automatically organizing information into structured formats optimized for future retrieval. Their study found Claude 3.7 Sonnet mentioned crucial hints it used to solve problems only 25% of the time — raising significant questions about the transparency of AI reasoning.

Get the Android app

Or read this on Venture Beat