Get the latest tech news

The rise of AI ‘reasoning’ models is making benchmarking more expensive

The rise of AI 'reasoning' models is making benchmarking more expensive, data from Artificial Analysis shows.

Artificial Analysis co-founder George Cameron told TechCrunch that the organization plans to increase its benchmarking spend as more AI labs develop reasoning models. Taylor estimates a single run-through of MMLU Pro, a question set designed to benchmark a model’s language comprehension skills, would have cost more than $1,800. But this colors the results, some experts say — even if there’s no evidence of manipulation, the mere suggestion of an AI lab’s involvement threatens to harm the integrity of the evaluation scoring.

Get the Android app

Or read this on TechCrunch