Get the latest tech news
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Notice: t = temperature, top-p = top-p cutoff, len = max context length RankModelDetailsScore Mar 20, 2024 Gemini-Pro Vision Gemini Team, Google, '23 5.80 April 23, 2024 GPT-4 Vision (0409) OpenAI, '23 5.40 Mar 20, 2024 GPT-4 Vision OpenAI, '23 5.26 Mar 20, 2024 Claude-3-Opus Anthropic, '24 2.42 Mar 20, 2024 CogAgent Tsinghua University & Zhipu AI Notice: t = temperature, top-p = top-p cutoff, len = max context length RankModelDetailsScore Mar 20, 2024 GPT-4 Vision OpenAI, '23 12.17 April 23, 2024 GPT-4 Vision (0409) OpenAI, '23 9.04 Mar 20, 2024 Claude-3-Opus Anthropic, '24 4.41 Mar 20, 2024 Gemini-Pro Vision Gemini Team, Google, '23 3.48 Mar 20, 2024 CogAgent Tsinghua University & Zhipu AI Notice: t = temperature, top-p = top-p cutoff, len = max context length RankModelDetailsScore Mar 20, 2024 GPT-4 Vision OpenAI, '23 11.77 April 23, 2024 GPT-4 Vision (0409) OpenAI, '23 8.40 Mar 20, 2024 Claude-3-Opus Anthropic, '24 6.72 Mar 20, 2024 Gemini-Pro Vision Gemini Team, Google, '23 1.06 Mar 20, 2024 CogAgent Tsinghua University & Zhipu AI
Or read this on Hacker News