Get the latest tech news

Anthropic destroyed millions of print books to build its AI models


Company hired Google’s book-scanning chief to cut up and digitize “all the books in the world.”…

To understand why Anthropic would want to scan millions of books, it's important to know that AI researchers build large language models (LLMs) like those that power ChatGPT and Claude by feeding billions of words into a neural network. In the quest for high-quality training data, the court filing states, Anthropic first chose to amass digitized versions of pirated books to avoid what CEO Dario Amodei called "legal/practice/business slog"—the complex licensing negotiations with publishers. And earlier this month, OpenAI and Microsoft announced they're working with Harvard's libraries to train AI models on nearly 1 million public domain books dating back to the 15th century—fully digitized but preserved to live another day.

Get the Android app

Or read this on ArsTechnica

Read more on:

Photo of Millions

Millions

Photo of AI models

AI models

Photo of Anthropic

Anthropic

Related news:

News photo

Judge’s “Fair Use” Ruling In Favor Of Anthropic Is Likely Just The Start Of Lengthy Copyright Battles Over AI Training Models

News photo

Anthropic makes it easier to create and share Claude's bite-sized Artifact apps

News photo

How Data Centers Are Deepening the Water Crisis - The largest data centers can guzzle millions of gallons of fresh water a day. Roughly 40% of US ones, Business Insider found, are in the most water-stressed areas of the country.