Get the latest tech news

Multiple AI Companies Ignore Robots.Txt Files, Scrape Web Content, Says Licensing Firm

Multiple AI companies are ignoring Robots.txt files meant to block the scraping of web content for generative AI systems, reports Reuters — citing a warning sent to publisher by content licensing startup TollBit. TollBit, an early-stage startup, is positioning itself as a matchmaker between c...

The company tracks AI traffic to the publishers' websites and uses analytics to help both sides settle on fees to be paid for the use of different types of content... TollBit said its analytics indicate "numerous" AI agents are bypassing the protocol, a standard tool used by publishers to indicate which parts of its site can be crawled. "What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites," TollBit wrote.

Get the Android app

Or read this on Slashdot