Get the latest tech news
aiOla drops ultra-fast ‘multi-head’ speech recognition model, beats OpenAI Whisper
To develop Whisper-Medusa speech recognition model, aiOla modified Whisper’s architecture to add a multi-head attention mechanism.
We addressed these challenges by employing our novel multi-head attention approach, which resulted in a model with nearly double the prediction speed while maintaining Whisper’s high levels of accuracy,” he said. Hetz told VentureBeat they have started with a 10-head model but will soon expand to a larger 20-head version capable of predicting 20 tokens at a time, leading to faster recognition and transcription without any loss of accuracy. Eventually, he believes improvement in recognition and transcription speeds will allow for faster turnaround times in speech applications and pave the way for providing real-time responses.
Or read this on Venture Beat