Get the latest tech news
Killed by LLM
A memorial to the benchmarks that defined—and were defeated by—AI progress Killed 1 month ago, Abstract reasoning challenge consisting of visual pattern completion tasks. Each task presents a sequence of abstract visual patterns and requires selecting the correct completion.
Killed 7 months ago, A curated suite of 23 challenging tasks from BIG-Bench where language models initially performed below average human level. Killed 5 years ago, A collection of more challenging language understanding tasks including word sense disambiguation, causal reasoning, and reading comprehension. Killed 5 years ago, A collection of carefully crafted sentence pairs with ambiguous pronoun references that resolve differently based on small changes.
Or read this on Hacker News