Get the latest tech news
Voxtral-Mini-3B-2507 – Open source speech understanding model
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. Analyze audio and generate structured summaries without the need for separate ASR and language models Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks:
Or read this on Hacker News