Get the latest tech news

LMSYS launches ‘Multimodal Arena’: GPT-4 tops leaderboard, but AI still can’t out-see humans


LMSYS's new Multimodal Arena reveals GPT-4 leads in AI vision tasks, but benchmark results show even top models lag behind human visual understanding and reasoning capabilities.

A more sobering picture emerges from the recently introduced CharXiv benchmark, developed by Princeton University researchers to assess AI performance in understanding charts from scientific papers. This disparity highlights a crucial challenge in AI development: while models have made impressive strides in tasks like object recognition and basic image captioning, they still struggle with the nuanced reasoning and contextual understanding that humans apply effortlessly to visual information. As companies race to integrate multimodal AI capabilities into products ranging from virtual assistants to autonomous vehicles, understanding the true limits of these systems becomes increasingly critical.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of GPT-4

GPT-4

Photo of Humans

Humans

Photo of launches

launches

Related news:

News photo

To help catch code errors made by ChatGPT, OpenAI uses human AI trainers in the hope of improving the model. To help the human trainers, OpenAI has developed another AI model called CriticGPT – in case the humans don't spot the mistakes

News photo

OpenAI Wants AI to Help Humans Train AI

News photo

‘AI systems should never be able to deceive humans’ | One of China’s leading advocates for artificial intelligence safeguards says international collaboration is key