Get the latest tech news

Tsinghua Uni just released Vidu, a text-to-video model as good as Sora

Vidu is based on a Universal Vision Transformer architecture, which the company says allows it to simulate the real physical world with multi-camera view generation.

China's Shengshu Technology and Tsinghua University have unveiled Vidu, a text-to-video model capable of generating 16-second clips at 1080p resolution with a single click. Vidu is based on a Universal Vision Transformer (U-ViT) architecture, which the company says allows it to simulate the real physical world with multi-camera view generation. According to the company, Vidu can generate videos with complex scenes adhering to real-world physics, such as realistic lighting and shadows, and detailed facial expressions.

Get the Android app

Or read this on Hacker News