Get the latest tech news

AI speech generator 'reaches human parity' – but it's too dangerous to release


Microsoft's VALL-E 2 can convincingly recreate human voices using just a few seconds of audio, its creators claim.

Microsoft researchers said VALL-E 2 was capable of generating "accurate, natural speech in the exact voice of the original speaker, comparable to human performance," in a paper that appeared June 17 on the pre-print server arXiv. "VALL-E 2 is the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time," the researchers wrote in the paper. "VALL-E 2 could synthesize speech that maintains speaker identity and could be used for educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on," the researchers added.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of AI speech generator

AI speech generator

Photo of human parity

human parity