Get the latest tech news
Gemini 2.5 gets 24.4% on MathArena USAMO beating previous top score of 4.7%
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
For open-source models, costs can vary significantly depending on the chosen API provider and our results may not always be achieved using the most cost-effective option. For gemini-2.0-flash-thinking it was impossible to determine the cost since the pay-as-you-go pricing is not available, and the Google API does not return the number of thinking tokens. If you find any contamination (e.g. problems that have appeared before), feel free to contact us and we will add this information to the table.
Or read this on Hacker News