Get the latest tech news

Evaluating RAG for large scale codebases


Read how Qodo evaluates RAG systems to optimize generative AI coding for large-scale codebases using LLM-as-a-judge and regression testing

We wrapped up the workflow in a lightweight CLI tool which we can run locally as well as in the CI, which performs prediction, evaluation, and stores the results in an experiment tracking system. Using these tools, we’ve dramatically decreased the effort involved in verifying if code changes led to unforeseen quality issue, from hours of manual probing to minutes for automated regression tests. RAG systems for large scale codebases are a core capability used across many of Qodo’s products and, as such, require a robust evaluation and quality mechanism.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of large scale

large scale

Photo of RAG

RAG

Related news:

News photo

DeepSeek’s R1 and OpenAI’s Deep Research just redefined AI — RAG, distillation, and custom models will never be the same

News photo

Beyond RAG: How cache-augmented generation reduces latency, complexity for smaller workloads

News photo

LlamaIndex goes beyond RAG so agents can make complex decisions