Get the latest tech news

Publishers Target Common Crawl In Fight Over AI Training Data


Long-running nonprofit Common Crawl has been a boon to researchers for years. But now its role in AI training data has triggered backlash from publishers.

Common Crawl’s evolution from low-key tool beloved by data nerds and ignored by everyone else to a newly-controversial AI helpmeet is part of a larger clash over copyright and the open web. Earlier this year, it led a campaign to file Digital Millennium Copyright Act (DMCA) takedown notices—which alert companies to potentially infringing content hosted on their platforms—for book publishers whose work had been uploaded to OpenAI’s GPTStore without their permission. He thinks that scuppering Common Crawl might primarily impact newcomers and smaller projects in addition to academics, entrenching today’s power players in their current dominant positions and calcifying the field.

Get the Android app

Or read this on Wired

Read more on:

Photo of Fight

Fight

Photo of foundational data

foundational data

Related news:

News photo

Our Fight to Keep the Law Free

News photo

Intel CEO Takes Aim at Nvidia in Fight for AI Chip Dominance

News photo

An appeals court rules that VC Fearless Fund cannot issue grants to Black women, but the fight continues