Get the latest tech news

New Claude 4 AI model refactored code for 7 hours straight


Anthropic says Claude 4 beats Gemini on coding benchmarks; works autonomously for hours.

Alex Albert, Anthropic's head of Claude Relations, told Ars Technica that the company chose to revive the Opus line because of growing demand for agentic AI applications. Instead, these more agentic models can iteratively search the web, parse the results, analyze images, and spin up coding tasks for analysis in ways that can avoid falling into a confabulation trap by relying solely on pure LLM outputs. Even with Anthropic's future riding on the capability of these new models, when we asked about how they guide Claude's behavior by fine-tuning, Albert acknowledged that the inherent unpredictability of these systems presents ongoing challenges for both them and developers.

Get the Android app

Or read this on ArsTechnica

Read more on:

Photo of Code

Code

Photo of hours

hours

Photo of AI model

AI model

Related news:

News photo

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

News photo

DOGE Used a Meta AI Model to Review Emails From Federal Workers

News photo

Anthropic’s new Claude 4 AI models can reason over many steps