Get the latest tech news

SWE-bench Verified no longer measures frontier coding capabilities


None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Bench

Bench

Photo of SWE-

SWE-

Related news:

News photo

N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

News photo

Many SWE-bench-Passing PRs would not be merged

News photo

15× vs. ~1.37×: Recalculating GPT-5.3-Codex-Spark on SWE-Bench Pro