Get the latest tech news

AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test

Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week. The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath's problems, which span computational number theory to algebraic geometry and require complex reasoning.

Get the Android app

Or read this on Slashdot