Get the latest tech news
Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data
We estimate the stock of human-generated public text at around 300 trillion tokens. If trends continue, language models will fully utilize this stock between 2026 and 2032, or even earlier if intensely overtrained.
Or read this on Hacker News