Get the latest tech news
Voyage-code-3
TL;DR – Introducing voyage-code-3, our next-generation embedding model optimized for code retrieval. It outperforms OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite …
By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality. Existing datasets can suffer from noisy labels, overly simplistic tasks, and data contamination risks, making them ill-suited for real-world applications. Our evaluation incorporated diverse tasks, such as text-to-code and code-to-code, repurposed question-answer datasets for retrieval, and introduced complex, real-world repositories and scenarios that challenge embedding models to achieve deeper understanding.
Or read this on Hacker News