Get the latest tech news

Hugging Face releases a benchmark for testing generative AI on health tasks


Hugging Face has published a new leaderboard and benchmark for evaluating generative AI models on health-related tasks and questions.

Open Medical-LLM isn’t a from-scratch benchmark per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. “[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face writes in a blog post. But despite high theoretical accuracy, the tool proved impractical in real-world testing, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of generative AI

generative AI

Photo of benchmark

benchmark

Photo of Hugging Face

Hugging Face

Related news:

News photo

Honeywell exec reveals plan to deliver $100 million in value with generative AI: “Just getting started”

News photo

Dataminr debuts ReGenAI pairing predictive with generative AI for real time information

News photo

Thomson Reuters unveils CoCounsel, leveraging generative AI for legal professionals