Get the latest tech news

OpenAI, Anthropic ignoring rule that prevents bots scraping online content


OpenAI and Anthropic have said publicly they respect robots.txt. But they are among the biggest tech companies ignoring the rule, BI has learned.

The world's top two AI startups are ignoring requests by media publishers to stop scraping their web content for free model training data, Business Insider has learned. A spokeswoman for OpenAI declined to comment beyond pointing BI to a corporate blogpost from May, in which the company says it takes web crawler permissions "into account each time we train a new model." Such answers are only possible because the AI models they are built on include massive amounts of written text and data scraped from the web, much of it under copyright or otherwise owned by creators.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of OpenAI

OpenAI

Photo of rule

rule

Photo of Anthropic

Anthropic

Related news:

News photo

OpenAI's First Acquisition Is Enterprise Data Startup 'Rockset'

News photo

Why Anthropic’s Artifacts may be this year’s most important AI feature: Unveiling the interface battle

News photo

OpenAI’s first acquisition is an enterprise data startup