Get the latest tech news

AI Models Still Struggle To Debug Software, Microsoft Study Shows

Some of the best AI models today still struggle to resolve software bugs that wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft Research, Microsoft's R&D division, reveals that models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues...

The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding. The study's co-authors tested nine different models as the backbone for a "single prompt-based agent" that had access to a number of debugging tools, including a Python debugger. According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully.

Get the Android app

Or read this on Slashdot