Get the latest tech news

ByteDance’s UI-TARS can take over your computer, outperforms GPT-4o and Claude


UI-TARS understands graphical user interfaces (GUIs), applies reasoning and takes autonomous, step-by-step action.

For example, in a demo video released today, the model is prompted to “Find round trip flights from SEA to NYC on the 5th and return on the 10th next month and filter by price in ascending order.” In response, UI-TARS navigates to the website for Delta Airlines, fills in the “from” and “to” fields, clicks in the relevant dates and sorts and filters by price, explaining each step in its thinking box before taking action. For instance, in VisualWebBench — which measures a model’s ability to ground web elements including webpage quality assurance and optical character recognition — UI-TARS 72B scored 82.8%, outperforming GPT-4o (78.5%) and Claude 3.5 (78.2%).

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of computer

computer

Photo of ByteDance

ByteDance

Photo of Claude

Claude

Related news:

News photo

CapCut, a Video-Editing App From ByteDance, Returns for U.S. Users

News photo

Anthropic plans to release a ‘two-way’ voice mode for Claude

News photo

TikTok is back: the briefly 'banned' app has been restored to 170 million US users