Get the latest tech news

AI Models Still Struggle To Debug Software, Microsoft Study Shows


Some of the best AI models today still struggle to resolve software bugs that wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft Research, Microsoft's R&D division, reveals that models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues...

The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding. The study's co-authors tested nine different models as the backbone for a "single prompt-based agent" that had access to a number of debugging tools, including a Python debugger. According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Microsoft

Microsoft

Photo of software

software

Photo of Models

Models

Related news:

News photo

Microsoft is About To Launch Recall For Real This Time

News photo

AI models still struggle to debug software, Microsoft study shows

News photo

Microsoft releases emergency update to fix Office 2016 crashes