Get the latest tech news

Killed by LLM


A memorial to the benchmarks that defined—and were defeated by—AI progress Killed 1 month ago, Abstract reasoning challenge consisting of visual pattern completion tasks. Each task presents a sequence of abstract visual patterns and requires selecting the correct completion.

Killed 7 months ago, A curated suite of 23 challenging tasks from BIG-Bench where language models initially performed below average human level. Killed 5 years ago, A collection of more challenging language understanding tasks including word sense disambiguation, causal reasoning, and reading comprehension. Killed 5 years ago, A collection of carefully crafted sentence pairs with ambiguous pronoun references that resolve differently based on small changes.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLM

LLM

Related news:

News photo

Empirical Study of Test Generation with LLM's

News photo

Show HN: Map with an LLM

News photo

Apple collaborates with Nvidia to research faster LLM performance