Get the latest tech news
AI can fix bugs—but can’t find them: OpenAI’s study highlights limits of LLMs in software engineering
A new test from OpenAI researchers found that LLMs were unable to resolve some freelance coding tests, failing to earn full value.
In a new paper, OpenAI researchers detail how they developed an LLM benchmark called SWE-Lancer to test how much foundation models can earn from real-life freelance software engineering tasks. “Tests simulate real-world user flows, such as logging into the application, performing complex actions (making financial transactions) and verifying that the model’s solution works as expected,” the paper explains. However, they often exhibit a limited understanding of how the issue spans multiple components or files, and fail to address the root cause, leading to solutions that are incorrect or insufficiently comprehensive.
Or read this on Venture Beat