Get the latest tech news

Hugging Face releases a benchmark for testing generative AI on health tasks

Hugging Face has published a new leaderboard and benchmark for evaluating generative AI models on health-related tasks and questions.

Open Medical-LLM isn’t a from-scratch benchmark per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. “[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face writes in a blog post. But despite high theoretical accuracy, the tool proved impractical in real-world testing, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.

Get the Android app

Or read this on TechCrunch