Get the latest tech news

Every AI model is flunking medicine - and LMArena proposes a fix


The benchmarking outfit is partnering with BiomedArena, a leaderboard specific to medical research.

The new study, comparing OpenAI's GPT-5 with numerous models from Google, Anthropic, and Meta, finds that "performance in real-world biomedical research remains far from adequate." The BiomedArena work, according to the LMArena team, will "focus on tasks and evaluation strategies grounded in the day-to-day realities of biomedical discovery -- from interpreting experimental data and literature to assisting in hypothesis generation and clinical translation." From today's announcement, it's not clear how LMArena and DataTecnica plan to address that aspect of AI models, which really is a kind of agentic capability -- the ability to tap into resources.

Get the Android app

Or read this on ZDNet

Read more on:

Photo of AI model

AI model

Photo of medicine

medicine

Photo of fix

fix

Related news:

News photo

Leeches and the Legitimizing of Folk-Medicine

News photo

AI model 'personalities' shape the quality of generated code

News photo

An AI Model for the Brain Is Coming to the ICU