Get the latest tech news

Voxtral-Mini-3B-2507 – Open source speech understanding model


We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. Analyze audio and generate structured summaries without the need for separate ASR and language models Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks:

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Hugging Face

Hugging Face

Related news:

News photo

Hugging Face Is Hosting 5,000 Nonconsensual AI Models of Real People

News photo

Hugging Face Is Hosting 5,000 Nonconsensual AI Models of Real People

News photo

Hugging Face just launched a $299 robot that could disrupt the entire robotics industry