Get the latest tech news
New secret math benchmark stumps AI models and PhDs alike
FrontierMath’s difficult questions remain unpublished so that AI companies can’t train against it.
The benchmark tests AI language models (such as GPT-4o, which powers ChatGPT) against original mathematics problems that typically require hours or days for specialist mathematicians to complete. The designers made problems "guessproof" by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses. "Because an AI system has vastly greater computational power, it's actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, 'write a proof' is replaced by 'implement an algorithm in code,'" Chen explained.
Or read this on ArsTechnica