Get the latest tech news
SOTA Code Retrieval with Efficient Code Embedding Models
Read how we trained code embedding models with synthetic data to achieve state-of-the-art performance on code retrieval.
For Qodo-Embed-1, we built a pipeline that automatically scrapes open-source code from GitHub, applies multiple filtering steps for quality, and then injects the data with synthetic function descriptions and docstrings. Query-to-code score: qodo-embed-1-7B: [64.34, 57.65] (correctly prefers relevant code)This snippet is a better fit because it directly addresses the user’s intent by actively handling operational failures through a simple retry mechanism. Instead of relying solely on matching keywords like “failure” or “operations,” the Qodo embedding model understands the underlying semantic meaning of the query, enabling it to return an example that meets the user’s practical needs.
Or read this on Hacker News