Get the latest tech news
Many SWE-bench-Passing PRs would not be merged
We find that roughly half of test-passing SWE-bench Verified PRs written by recent AI agents would not be merged into main by repo maintainers. A naive interpretation of benchmark scores may lead one to overestimate how useful agents are without more elicitation or human feedback.
None
Or read this on Hacker News

