Get the latest tech news

Crossing the uncanny valley of conversational voice


At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.

Building a digital companion with voice presence is not easy, but we are making steady progress on multiple fronts, including personality, memory, expressivity and appropriateness. Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang In the first CMOS study we presented the generated and human audio samples with no context and asked listeners to “choose which rendition feels more like human speech.” In the second CMOS study we also provide the previous 90 seconds of audio and text context, and ask the listeners to “choose which rendition feels like a more appropriate continuation of the conversation.” Eighty people were paid to participate in the evaluation and rated on average 15 examples each.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of uncanny valley

uncanny valley

Photo of conversational voice

conversational voice

Related news:

News photo

Merry Slopmas! AI-generated Christmas classics that dwell in the uncanny valley are giving listeners the creeps.

News photo

Introducing WIRED’s Flagship Podcast, ‘Uncanny Valley’

News photo

OpenAI’s new voice mode threw me into the uncanny valley