Get the latest tech news
Failing to Understand the Exponential, Again
Posts and writings by Julian Schrittwieser
Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon. We can observe a clear exponential trend, with Sonnet 3.7 achieving the best performance by completing tasks up to an hour in length at 50% success rate. Fortunately for us, OpenAI also included other models in the evaluation, and we can see that Claude Opus 4.1 (released earlier than GPT-5) performs significantly better - ahead of the trend from the previous graph, and already almost matching industry expert (!)
Or read this on Hacker News