Get the latest tech news

VibeVoice: A Frontier Open-Source Text-to-Speech Model


📄 Report · Code · 🤗 Hugging Face · Demo VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Microsoft

Microsoft

Photo of model

model

Photo of speech

speech

Related news:

News photo

Microsoft swats down reports of SSD failures in Windows — company says recent update didn't cause storage failures - tomshardware

News photo

Microsoft rewarded for security failures with another US Government contract

News photo

Microsoft rewarded for security failures with another US government contract