Get the latest tech news

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking


Read about Qodo's new benchmark, which evaluates how well AI agents navigate and understand large, complex enterprise codebases using real-world questions.

We introduce a benchmark based on realistic questions derived from pull requests that require retrieval across multiple files in a codebase. We identified that pull requests (PRs) are good sources for complex code changes with proper context that can be used for question and answer generation. Scope: Codex and Claude preferred deep over broad questions, while DeepResearch performed equally well on both due to wide search capabilities.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Q&A

Q&A

Photo of q&a benchmarking

q&a benchmarking

Photo of deepcodebench

deepcodebench

Related news:

News photo

Q&A: App helps shoppers buy Canadian, one barcode at a time

News photo

Q&A: Lucas Graves on Meta’s Decision to Shut Down Its Global Fact-Checking Program

News photo

Mark Zuckerberg Defends Embrace of Trump Administration in Meta Employee Q&A