Get the latest tech news
AI speech generator 'reaches human parity' – but it's too dangerous to release
Microsoft's VALL-E 2 can convincingly recreate human voices using just a few seconds of audio, its creators claim.
Microsoft researchers said VALL-E 2 was capable of generating "accurate, natural speech in the exact voice of the original speaker, comparable to human performance," in a paper that appeared June 17 on the pre-print server arXiv. "VALL-E 2 is the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time," the researchers wrote in the paper. "VALL-E 2 could synthesize speech that maintains speaker identity and could be used for educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on," the researchers added.
Or read this on Hacker News