Get the latest tech news

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking

Read about Qodo's new benchmark, which evaluates how well AI agents navigate and understand large, complex enterprise codebases using real-world questions.

We introduce a benchmark based on realistic questions derived from pull requests that require retrieval across multiple files in a codebase. We identified that pull requests (PRs) are good sources for complex code changes with proper context that can be used for question and answer generation. Scope: Codex and Claude preferred deep over broad questions, while DeepResearch performed equally well on both due to wide search capabilities.

Get the Android app

Or read this on Hacker News