Get the latest tech news
A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more
With a focus on expressive quality, reproducibility, and open access, Dia adds a distinctive new voice to the landscape of text-to-speech.
A two-person startup by the name of Nari Labs has introduced Dia, a 1.6 billion parameter text-to-speech (TTS) model designed to produce naturalistic dialogue directly from text prompts — and one of its creators claims it surpasses the performance of competing proprietary offerings from the likes of ElevenLabs, Google’s hit NotebookLM AI podcast generation product. In one set of tests, Nari Labs noted that Sesame’s best website demo likely used an internal 8B version of the model rather than the public 1B checkpoint, resulting in a gap between advertised and actual performance. Dia’s development credits support from the Google TPU Research Cloud, Hugging Face’s ZeroGPU grant program, and prior work on SoundStorm, Parakeet, and Descript Audio Codec.
Or read this on Venture Beat