Get the latest tech news

OpenAI, Anthropic ignoring rule that prevents bots scraping online content

OpenAI and Anthropic have said publicly they respect robots.txt. But they are among the biggest tech companies ignoring the rule, BI has learned.

The world's top two AI startups are ignoring requests by media publishers to stop scraping their web content for free model training data, Business Insider has learned. A spokeswoman for OpenAI declined to comment beyond pointing BI to a corporate blogpost from May, in which the company says it takes web crawler permissions "into account each time we train a new model." Such answers are only possible because the AI models they are built on include massive amounts of written text and data scraped from the web, much of it under copyright or otherwise owned by creators.

Get the Android app

Or read this on Hacker News