Get the latest tech news
Task-free intelligence testing of LLMs
I recently wrote about the apparently narrow focus of LLM evaluation on "task based" testing. The typical eval has a set of tasks, questions, problems, etc that need to be solved or answered, and a model is scored based on how many it answers correctly.
None
Or read this on Hacker News
