Get the latest tech news
DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking
Read about Qodo's new benchmark, which evaluates how well AI agents navigate and understand large, complex enterprise codebases using real-world questions.
We introduce a benchmark based on realistic questions derived from pull requests that require retrieval across multiple files in a codebase. We identified that pull requests (PRs) are good sources for complex code changes with proper context that can be used for question and answer generation. Scope: Codex and Claude preferred deep over broad questions, while DeepResearch performed equally well on both due to wide search capabilities.
Or read this on Hacker News