Get the latest tech news

Tsinghua Uni just released Vidu, a text-to-video model as good as Sora


Vidu is based on a Universal Vision Transformer architecture, which the company says allows it to simulate the real physical world with multi-camera view generation.

China's Shengshu Technology and Tsinghua University have unveiled Vidu, a text-to-video model capable of generating 16-second clips at 1080p resolution with a single click. Vidu is based on a Universal Vision Transformer (U-ViT) architecture, which the company says allows it to simulate the real physical world with multi-camera view generation. According to the company, Vidu can generate videos with complex scenes adhering to real-world physics, such as realistic lighting and shadows, and detailed facial expressions.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Text

Text

Photo of video

video

Photo of model

model

Related news:

News photo

Creators of Sora-powered short explain AI-generated video’s strengths and limitations

News photo

Snowflake Launches Text-Embedding Model for Retrieval Use Cases

News photo

Adobe's new upscaling tech uses AI to sharpen video