Get the latest tech news

Can AI do maths yet? Thoughts from a mathematician

So the big news this week is that o3, OpenAI’s new language model, got 25% on FrontierMath. Let’s start by explaining what this means.

As an academic mathematician who spent their entire life collaborating openly on research problems and sharing my ideas with other people, it frustrates me a little that already in this paragraph we’ve seen more questions than answers — I am not even to give you a coherent description of some basic facts about this dataset, for example, its size. This claim is a little confusing because I would be hard pressed to apply such adjectives to any of the five publically-released problems in the dataset; even the simplest one used the Weil conjectures for curves (or a brute force argument which is probably just about possible but would be extremely painful, as it involves factoring 10^12 degree 3 polynomials over a finite field, although this could certainly be parallelised). I am dreading the inevitable onslaught in a year or two of language model “proofs” of the Riemann hypothesis which will just contain claims which are vague or inaccurate in the middle of 10 pages of correct mathematics which the human will have to wade through to find the line which doesn’t hold up.

Get the Android app

Or read this on Hacker News