Get the latest tech news
Google shows off Lumiere, a space-time diffusion model for realistic AI videos
Lumiere was trained on a dataset of 30 million videos, along with their text captions, and is capable of generating 80 frames at 16 fps. The source of this data, however, remains unclear at this stage.
While these capabilities are not new in the industry and have been offered by players like Runway and Pika, the authors claim that most existing models tackle the added temporal data dimensions (representing a state in time) associated with video generation by using a cascaded approach. This works but makes temporal consistency difficult to achieve, often leading to restrictions in terms of video duration, overall visual quality, and the degree of realistic motion they can generate. Lumiere, on its part, addresses this gap by using a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model, leading to more realistic and coherent motion.
Or read this on Venture Beat