Get the latest tech news
ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude
UI-TARS understands graphical user interfaces (GUIs), applies reasoning and takes autonomous, step-by-step action.
For example, in a demo video released today, the model is prompted to “Find round trip flights from SEA to NYC on the 5th and return on the 10th next month and filter by price in ascending order.” In response, UI-TARS navigates to the website for Delta Airlines, fills in the “from” and “to” fields, clicks in the relevant dates and sorts and filters by price, explaining each step in its thinking box before taking action. For instance, in VisualWebBench — which measures a model’s ability to ground web elements including webpage quality assurance and optical character recognition — UI-TARS 72B scored 82.8%, outperforming GPT-4o (78.5%) and Claude 3.5 (78.2%).
Or read this on Venture Beat