Get the latest tech news
OmniParser V2 – A simple screen parsing tool towards pure vision based GUI agent
A simple screen parsing tool towards pure vision based GUI agent - microsoft/OmniParser
Watch Video[2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. We achieve new state of the art results 39.5% on the new grounding benchmark Screen Spot Pro with OmniParser v2 (will be released soon)! [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not.
Or read this on Hacker News