Get the latest tech news

CompileBench: Can AI Compile 22-year-old Code?


We tested 19 LLMs on their ability to handle real-world software engineering tasks like compiling old code and cross-compiling. See how Anthropic, OpenAI, and Google models stack up in our new benchmark – CompileBench.

We tested 19 state-of-the-art LLMs on 15 real-world tasks using the unmodified source code of open-source projects like curl(HTTP client) and jq(command-line JSON processor). We tested popular projects including curl(HTTP client), GNU Coreutils (utilities like cp, ls, mv), and jq(JSON processor). For example, for curl we check whether the model created an actual executable, whether it reports the correct version matching the source code, and whether it can successfully make HTTP requests.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Year

Year

Photo of old code

old code

Photo of compilebench

compilebench

Related news:

News photo

8 smart home gadgets I invested in this year - and how they're already paying off

News photo

Xbox has made "largest investment in Game Pass to date" this year, countering recent criticism of subscription service's value

News photo

Japan's space agency officially ends decade-plus mission that carried Hatsune Miku into space one year after losing probe somewhere above Venus | JAXA's Akatsuki Venus orbiter has officially been shut down well over a decade after launch.