Get the latest tech news
Diffusion models are real-time game engines
Diffusion Models Are Real-Time Game Engines
This allows the network to correct information sampled in previous frames, and we found it to be critical for preserving visual stability over long time periods. Latent Decoder Fine-Tuning: The pre-trained auto-encoder of Stable Diffusion v1.4, which compresses 8x8 pixel patches into 4 latent channels, results in meaningful artifacts when predicting game frames, which affect small details and particularly the bottom bar HUD. To leverage the pre-trained knowledge while improving image quality, we train just the decoder of the latent auto-encoder using an MSE loss computed against the target frame pixels.
Or read this on Hacker News