Get the latest tech news

SOTA Code Retrieval with Efficient Code Embedding Models


Read how we trained code embedding models with synthetic data to achieve state-of-the-art performance on code retrieval.

For Qodo-Embed-1, we built a pipeline that automatically scrapes open-source code from GitHub, applies multiple filtering steps for quality, and then injects the data with synthetic function descriptions and docstrings. Query-to-code score: qodo-embed-1-7B: [64.34, 57.65] (correctly prefers relevant code)This snippet is a better fit because it directly addresses the user’s intent by actively handling operational failures through a simple retry mechanism. Instead of relying solely on matching keywords like “failure” or “operations,” the Qodo embedding model understands the underlying semantic meaning of the query, enabling it to return an example that meets the user’s practical needs.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of embedding models

embedding models

Photo of efficient code

efficient code

Photo of sota code retrieval

sota code retrieval

Related news:

News photo

AI still has a hallucination problem: How MongoDB aims to solve it with advanced rerankers and embedding models

News photo

32k context length text embedding models

News photo

Polar Signals wants to help companies cut cloud costs with more efficient code