Get the latest tech news

A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more


With a focus on expressive quality, reproducibility, and open access, Dia adds a distinctive new voice to the landscape of text-to-speech.

A two-person startup by the name of Nari Labs has introduced Dia, a 1.6 billion parameter text-to-speech (TTS) model designed to produce naturalistic dialogue directly from text prompts — and one of its creators claims it surpasses the performance of competing proprietary offerings from the likes of ElevenLabs, Google’s hit NotebookLM AI podcast generation product. In one set of tests, Nari Labs noted that Sesame’s best website demo likely used an internal 8B version of the model rather than the public 1B checkpoint, resulting in a gap between advertised and actual performance. Dia’s development credits support from the Google TPU Research Cloud, Hugging Face’s ZeroGPU grant program, and prior work on SoundStorm, Parakeet, and Descript Audio Codec.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of OpenAI

OpenAI

Photo of ElevenLabs

ElevenLabs

Photo of model

model

Related news:

News photo

OpenAI Would Buy Google’s Chrome Browser, ChatGPT Chief Says

News photo

ChatGPT Chief Calls Search Crucial for OpenAI in Google Trial

News photo

Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime