Get the latest tech news

AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test


Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week. The benchmark, called FrontierMath, consists of hundreds of original research-level mathe...

Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week. The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath's problems, which span computational number theory to algebraic geometry and require complex reasoning.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of systems

systems

Photo of new benchmark test

new benchmark test

Related news:

News photo

GOG announces Preservation Program, aims to maintain game compatibility with modern and future systems

News photo

Debcow Optimizing Debian Packages For Copy-On-Write File-Systems

News photo

Why do systems fail? Tandem NonStop system and fault tolerance