Get the latest tech news

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models


Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

“We wanted to develop a benchmark with problems that humans can understand with only general knowledge,” Arjun Guha, a computer science faculty member at Northeastern and one of the co-authors on the study, told TechCrunch. Most of the tests commonly used to evaluate AI models probe for skills, like competency on PhD-level math and science questions, that aren’t relevant to the average user. The advantages of a public radio quiz game like the Sunday Puzzle is that it doesn’t test for esoteric knowledge, and the challenges are phrased such that models can’t draw on “rote memory” to solve them, explained Guha.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of Models

Models

Photo of researchers

researchers

Photo of reasoning

reasoning

Related news:

News photo

Researchers are training AI to interpret animal emotions

News photo

Time Flows Forward or Backward At Quantum Levels, Researchers Suggest

News photo

Neurons that tell you to stop eating could unlock obesity treatments | Researchers have identified the specific neurons in mice brains that tell them they've eaten enough. This discovery could play a big role in the future of weight loss treatments for humans.