Get the latest tech news

Gemini 2.5 gets 24.4% on MathArena USAMO beating previous top score of 4.7%


MathArena: Evaluating LLMs on Uncontaminated Math Competitions

For open-source models, costs can vary significantly depending on the chosen API provider and our results may not always be achieved using the most cost-effective option. For gemini-2.0-flash-thinking it was impossible to determine the cost since the pay-as-you-go pricing is not available, and the Google API does not return the number of thinking tokens. If you find any contamination (e.g. problems that have appeared before), feel free to contact us and we will add this information to the table.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of gemini

gemini

Photo of previous top score

previous top score

Photo of MathArena

MathArena

Related news:

News photo

How Google built its Gemini robotics models

News photo

Google's plan to steal ChatGPT's market share is all about Gemini's free tier

News photo

Gemini could be the next Google service reimagined for kids