Get the latest tech news

Multiple AI Companies Ignore Robots.Txt Files, Scrape Web Content, Says Licensing Firm


Multiple AI companies are ignoring Robots.txt files meant to block the scraping of web content for generative AI systems, reports Reuters — citing a warning sent to publisher by content licensing startup TollBit. TollBit, an early-stage startup, is positioning itself as a matchmaker between c...

The company tracks AI traffic to the publishers' websites and uses analytics to help both sides settle on fees to be paid for the use of different types of content... TollBit said its analytics indicate "numerous" AI agents are bypassing the protocol, a standard tool used by publishers to indicate which parts of its site can be crawled. "What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites," TollBit wrote.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Companies

Companies

Photo of licensing firm

licensing firm

Photo of scrape web content

scrape web content

Related news:

News photo

Top AI Companies Really Want to Make Their Chatbots Funnier

News photo

PayPal Ventures leads $20M round into Gynger, which offers companies ‘buy now, pay later’ for technology purchases

News photo

Hero wants to save the day for companies facing a working capital crunch