Get the latest tech news

VGGT: Visual Geometry Grounded Transformer

[CVPR 2025] VGGT: Visual Geometry Grounded Transformer - facebookresearch/vggt

Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds. We did not quantitatively test monocular depth estimation performance ourselves, but@kabouzeid generously provided a comparison of VGGT to recent methods here. VGGT shows competitive or better results compared to state-of-the-art monocular approaches such as DepthAnything v2 or MoGe, despite never being explicitly trained for single-view tasks.

Get the Android app

Or read this on Hacker News