Get the latest tech news
VGGT: Visual Geometry Grounded Transformer
[CVPR 2025] VGGT: Visual Geometry Grounded Transformer - facebookresearch/vggt
Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds. We did not quantitatively test monocular depth estimation performance ourselves, but@kabouzeid generously provided a comparison of VGGT to recent methods here. VGGT shows competitive or better results compared to state-of-the-art monocular approaches such as DepthAnything v2 or MoGe, despite never being explicitly trained for single-view tasks.
Or read this on Hacker News