Get the latest tech news
Tom and Jerry One-Minute Video Generation with Test-Time Training
A new approach using Test-Time Training (TTT) layers to generate coherent, minute-long videos from text.
Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text storyboards. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos with strong temporal consistency and motion smoothness. TTT-MLP outperforms all other baselines in temporal consistency, motion smoothness, and overall aesthetics, as measured by human evaluation Elo scores.
Or read this on Hacker News