Get the latest tech news

OmniParser V2 – A simple screen parsing tool towards pure vision based GUI agent


A simple screen parsing tool towards pure vision based GUI agent - microsoft/OmniParser

Watch Video[2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. We achieve new state of the art results 39.5% on the new grounding benchmark Screen Spot Pro with OmniParser v2 (will be released soon)! [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GUI

GUI

Photo of pure vision

pure vision

Photo of GUI agent

GUI agent

Related news:

News photo

If OpenSSL were a GUI (2022)

News photo

Show HN: GUI for editing Mermaid class diagrams

News photo

Egui – An immediate mode GUI written in Rust