Get the latest tech news

Video models are zero-shot learners and reasoners


Video models like Veo 3 are on a path to become vision foundation models.

Veo 3 shows emergent zero-shot abilities across many visual tasks, indicating that video models are on a path to becoming vision foundation models—just like LLMs became foundation models for language. This transformation emerged from simple primitives: large, generative models trained on web-scale data. We demonstrate that Veo 3 can zero-shot solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and much more.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of video models

video models

Photo of reasoners

reasoners

Photo of shot learners

shot learners

Related news:

News photo

Meta partners with Midjourney on AI image and video models