Get the latest tech news
Crossing the uncanny valley of conversational voice
At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
Building a digital companion with voice presence is not easy, but we are making steady progress on multiple fronts, including personality, memory, expressivity and appropriateness. Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang In the first CMOS study we presented the generated and human audio samples with no context and asked listeners to “choose which rendition feels more like human speech.” In the second CMOS study we also provide the previous 90 seconds of audio and text context, and ask the listeners to “choose which rendition feels like a more appropriate continuation of the conversation.” Eighty people were paid to participate in the evaluation and rated on average 15 examples each.
Or read this on Hacker News