Get the latest tech news
AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test
Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week. The benchmark, called FrontierMath, consists of hundreds of original research-level mathe...
Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week. The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath's problems, which span computational number theory to algebraic geometry and require complex reasoning.
Or read this on Slashdot