Get the latest tech news
Think of a Number
My feed was recently clogged up with news articles reporting that Sam Altman thinks that AGI is here, or will be here next year, or whatever. I will refrain from giving even more air to this nonsen…
But shortly after that, Epoch AI (who put he dataset together) revealed that 25 percent of the questions were undergraduate or olympiad level, and this week it transpires that OpenAI were behind the funding of the dataset and reportedly were even given access to some of the questions (added later: in fact Elliot Glazer from Epoch AI has confirmed this on Reddit); all of a sudden 25 percent doesn’t sound so great any more. Because LLMs are not up to writing proofs, the answers unfortunately need to be non-negative integers: there is simply no point asking an LLM basic research level questions and then ploughing through the crap they produce and giving it 0/10; anyone who thinks that LLMs are actually capable of writing proofs should show me one that can do the Putnam because all efforts I saw on the 2024 exam were abysmal and this is undergraduate level: as always, the difficulty here is that soon after the questions and solutions are made public, the models train on them and the experiment is thus invalidated. Ideally the answer is not easily guessable, because I have heard reports of LLMs getting questions on the FrontierMath dataset right, backed up by completely spurious reasoning.
Or read this on Hacker News