Get the latest tech news
OpenAI quietly funded independent math benchmark before setting record with o3
OpenAI's involvement in funding FrontierMath, a leading AI math benchmark, only came to light when the company announced its record-breaking performance on the test. Now, the benchmark's developer Epoch AI acknowledges they should have been more transparent about the relationship.
"For future collaborations, we will strive to improve transparency wherever possible, ensuring contributors have clearer information about funding sources, data access, and usage purposes at the outset," Besiroglu writes. While this lack of transparency doesn't undermine the benchmark's quality or significance per se, such an important tool for AI evaluation deserved complete openness from the start, especially since mathematical reasoning is a major weakness of language models and improved logical performance could signal a breakthrough. OpenAI had early access to many of the tasks and solutions in the benchmark, and while there was a verbal agreement that this data would not be used for training, the lack of transparency surrounding the arrangement remains problematic.
Or read this on r/technology