Get the latest tech news
Evaluating RAG for large scale codebases
Read how Qodo evaluates RAG systems to optimize generative AI coding for large-scale codebases using LLM-as-a-judge and regression testing
We wrapped up the workflow in a lightweight CLI tool which we can run locally as well as in the CI, which performs prediction, evaluation, and stores the results in an experiment tracking system. Using these tools, we’ve dramatically decreased the effort involved in verifying if code changes led to unforeseen quality issue, from hours of manual probing to minutes for automated regression tests. RAG systems for large scale codebases are a core capability used across many of Qodo’s products and, as such, require a robust evaluation and quality mechanism.
Or read this on Hacker News