Get the latest tech news
Anthropic destroyed millions of print books to build its AI models
Company hired Google’s book-scanning chief to cut up and digitize “all the books in the world.”…
To understand why Anthropic would want to scan millions of books, it's important to know that AI researchers build large language models (LLMs) like those that power ChatGPT and Claude by feeding billions of words into a neural network. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. And earlier this month, OpenAI and Microsoft announced they're working with Harvard's libraries to train AI models on nearly 1 million public domain books dating back to the 15th century—fully digitized but preserved to live another day.
Or read this on ArsTechnica