Get the latest tech news

AI is dumber than you think

The top generative AI company, OpenAI, gave leading chatbots an “SAT test.” The chatbots failed miserably.

OpenAI recently introduced SimpleQA, a new benchmark for evaluating the factual accuracy of large language models(LLMs) that underpin generative AI (genAI). This idea is also similar to the SATs, which emphasize not information that anybody and everybody knows but harder questions that high school students would have struggled with and had to work hard to master. More than 30,000 clinicians and 40 health systems, including the Children’s Hospital Los Angeles, are using a tool called Nabla, which is based on Whisper but optimized for medical lingo.

Get the Android app

Or read this on r/technology