Get the latest tech news

Every AI model is flunking medicine - and LMArena proposes a fix

The benchmarking outfit is partnering with BiomedArena, a leaderboard specific to medical research.

The new study, comparing OpenAI's GPT-5 with numerous models from Google, Anthropic, and Meta, finds that "performance in real-world biomedical research remains far from adequate." The BiomedArena work, according to the LMArena team, will "focus on tasks and evaluation strategies grounded in the day-to-day realities of biomedical discovery -- from interpreting experimental data and literature to assisting in hypothesis generation and clinical translation." From today's announcement, it's not clear how LMArena and DataTecnica plan to address that aspect of AI models, which really is a kind of agentic capability -- the ability to tap into resources.

Get the Android app

Or read this on ZDNet