Get the latest tech news
Publishers Target Common Crawl In Fight Over AI Training Data
Long-running nonprofit Common Crawl has been a boon to researchers for years. But now its role in AI training data has triggered backlash from publishers.
Common Crawl’s evolution from low-key tool beloved by data nerds and ignored by everyone else to a newly-controversial AI helpmeet is part of a larger clash over copyright and the open web. Earlier this year, it led a campaign to file Digital Millennium Copyright Act (DMCA) takedown notices—which alert companies to potentially infringing content hosted on their platforms—for book publishers whose work had been uploaded to OpenAI’s GPTStore without their permission. He thinks that scuppering Common Crawl might primarily impact newcomers and smaller projects in addition to academics, entrenching today’s power players in their current dominant positions and calcifying the field.
Or read this on Wired