Get the latest tech news
32k context length text embedding models
TL;DR – We are excited to announce voyage-3 and voyage-3-lite embedding models, advancing the frontier of retrieval quality, latency, and cost. voyage-3 outperforms OpenAI v3 large by 7.55% on aver…
Outperforms OpenAI v3 large across all eight evaluated domains (tech, code, web, law, finance, multilingual, conservation, and long-context) by 7.55% on average. If you are particularly interested in code, law, finance, and multilingual retrieval, Voyage 2 series domain-specific models ( voyage-code-2, voyage-law-2, voyage-finance-2, and voyage-multilingual-2) are still best for their respective domains, even though voyage-3 has highly competitive performance as well (see Section below). CategoryDescriptionsDatasets TECHTechnical documentationCohere, 5G, OneSignal, LangChain, PyTorchCODECode snippets, docstringsLeetCodeCpp, LeetCodeJava, LeetCodePython, HumanEval, MBPP, DS1000-referenceonly, DS1000, apps_5docLAWCases, court opinions, statutes, patents LeCaRDv2, LegalQuAD, LegalSummarization, AILA casedocs, AILA statutes FINANCESEC filings, finance QARAG benchmark (Apple-10K-2022), FinanceBench, TAT-QA, Finance Alpaca, FiQA Personal Finance, Stock News Sentiment, ConvFinQA, FinQA, HC3 FinanceWEBReviews, forum posts, policy pagesHuffpostsports, Huffpostscience, Doordash, Health4CALONG-CONTEXTLong documents on assorted topics: government reports, academic papers, and dialogues NarrativeQA, Needle, Passkey, QMSum, SummScreenFD, WikimQA CONVERSATIONMeeting transcripts, dialoguesDialog Sum, QA Conv, HQA Models.
Or read this on Hacker News