Get the latest tech news

Task-free intelligence testing of LLMs

I recently wrote about the apparently narrow focus of LLM evaluation on "task based" testing. The typical eval has a set of tasks, questions, problems, etc that need to be solved or answered, and a model is scored based on how many it answers correctly.

None

Get the Android app

Or read this on Hacker News