Get the latest tech news
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
INFP achieves interactive head generation for dyadic conversations.
Unlike previous head generation works that only focus on single-sided communication, or require manual role assignment and explicit role switching, our model drives the agent portrait dynamically alternates between speaking and listening state, guided by the input dyadic audio. The second stage learns the mapping from the input dyadic audio to motion latent codes through denoising, leading to the audio-driven head generation in interactive scenarios. To facilitate this line of research, we introduce DyConv, a large scale dataset of rich dyadic conversations collected from the Internet.
Or read this on Hacker News