Get the latest tech news

AI isn’t very good at history, new paper finds


Top LLMs performed poorly on a high-level history test, a new paper has found.

A team of researchers has created a new benchmark to test three top large language models (LLMs) — OpenAI’s GPT-4, Meta’s Llama, and Google’s Gemini — on historical questions. The benchmark, Hist-LLM, tests the correctness of answers according to the Seshat Global History Databank, a vast database of historical knowledge named after the ancient Egyptian goddess of wisdom. They’re great for basic facts, but when it comes to more nuanced, PhD-level historical inquiry, they’re not yet up to the task,” said Maria del Rio-Chanona, one of the paper’s co-authors and an associate professor of computer science at University College London.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of History

History

Photo of new paper

new paper

Related news:

News photo

The history and use of /etc./glob in early Unixes

News photo

A decade and a half of instability: The history of Google messaging apps (2021)

News photo

Listen to your technology users — they have led to the most disruptive innovations in history