Get the latest tech news
The rise of AI ‘reasoning’ models is making benchmarking more expensive
The rise of AI 'reasoning' models is making benchmarking more expensive, data from Artificial Analysis shows.
Artificial Analysis co-founder George Cameron told TechCrunch that the organization plans to increase its benchmarking spend as more AI labs develop reasoning models. Taylor estimates a single run-through of MMLU Pro, a question set designed to benchmark a model’s language comprehension skills, would have cost more than $1,800. But this colors the results, some experts say — even if there’s no evidence of manipulation, the mere suggestion of an AI lab’s involvement threatens to harm the integrity of the evaluation scoring.
Or read this on TechCrunch