Get the latest tech news
voyage-3.5 and voyage-3.5-lite: improved quality for a new retrieval frontier
TL;DR – We’re excited to introduce voyage-3.5 and voyage-3.5-lite, the latest generation of our embedding models. These models offer improved retrieval quality over voyage-3 and voyage-3-lite at th…
Both models support embeddings in 2048, 1024, 512, and 256 dimensions, with multiple quantization options enabled by Matryoshka learning and quantization-aware training. Each dataset consists of a corpus (e.g., technical documentation, court opinions) and queries (e.g., questions, summaries). CategoryDescriptionsDatasets TECHTechnical documentationCohere, 5G, OneSignal, LangChain, PyTorchCODECode snippets, docstringsLeetCodeCpp-rtl, LeetCodeJava-rtl, LeetCodePython-rtl, HumanEval-rtl, MBPP-rtl, DS1000-referenceonly-rtl, DS1000-rtl, APPS-rtlLAWCases, court opinions, statutes, patents LeCaRDv2, LegalQuAD, LegalSummarization, AILA casedocs, AILA statutes FINANCESEC filings, finance QARAG benchmark (Apple-10K-2022), FinanceBench, TAT-QA-rtl, Finance Alpaca, FiQA-Personal-Finance-rtl, Stock News Sentiment, ConvFinQA-rtl, FinQA-rtl, HC3 FinanceWEBReviews, forum posts, policy pagesHuffpostsports, Huffpostscience, Doordash, Health4CALONG-CONTEXTLong documents on assorted topics: government reports, academic papers, and dialogues NarrativeQA, Needle, Passkey, QMSum, SummScreenFD, WikimQA CONVERSATIONMeeting transcripts, dialoguesDialog Sum, QA Conv, HQA Models.
Or read this on Hacker News