Get the latest tech news

SOTA Code Retrieval with Efficient Code Embedding Models

Read how we trained code embedding models with synthetic data to achieve state-of-the-art performance on code retrieval.

For Qodo-Embed-1, we built a pipeline that automatically scrapes open-source code from GitHub, applies multiple filtering steps for quality, and then injects the data with synthetic function descriptions and docstrings. Query-to-code score: qodo-embed-1-7B: [64.34, 57.65] (correctly prefers relevant code)This snippet is a better fit because it directly addresses the user’s intent by actively handling operational failures through a simple retry mechanism. Instead of relying solely on matching keywords like “failure” or “operations,” the Qodo embedding model understands the underlying semantic meaning of the query, enabling it to return an example that meets the user’s practical needs.

Get the Android app

Or read this on Hacker News