Get the latest tech news

Measuring AI Ability to Complete Long Tasks


We propose measuring AI performance in terms of the *length* of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Opus

Opus

Photo of 4h49M

4h49M

Photo of Complete Long Tasks

Complete Long Tasks

Related news:

News photo

Elevated errors across many models

News photo

Codex, Opus, Gemini try to build Counter Strike

News photo

Is Opus 4.5 really 'the best model in the world for coding'? It just failed half my tests