Get the latest tech news

Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data


We estimate the stock of human-generated public text at around 300 trillion tokens. If trends continue, language models will fully utilize this stock between 2026 and 2032, or even earlier if intensely overtrained.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of data

data

Photo of Human

Human

Photo of limits

limits

Related news:

News photo

Hackers use F5 BIG-IP malware to stealthily steal data for years

News photo

Tinybird raises another $30 million to transform data into real-time APIs

News photo

Campaigns Can Now See What You Watch on TV