Get the latest tech news

AI Crawlers Haven't Learned To Play Nice With Websites


SourceHut, an open-source-friendly git-hosting service, says web crawlers for AI companies are slowing down services through their excessive demands for data. From a report: "SourceHut continues to face disruptions due to aggressive LLM crawlers," the biz reported Monday on its status page. "We are ...

We have deployed a number of mitigations which are keeping the problem contained for now. SourceHut said it had deployed Nepenthes, a tar pit to catch web crawlers that scrape data primarily for training large language models, and noted that doing so might degrade access to some web pages for users. "We have unilaterally blocked several cloud providers, including GCP [Google Cloud] and [Microsoft] Azure, for the high volumes of bot traffic originating from their networks," the biz said, advising administrators of services that integrate with SourceHut to get in touch to arrange an exception to the blocking.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Websites

Websites

Photo of crawlers

crawlers

Related news:

News photo

Hamster forum and local residents’ websites shut down by new internet laws

News photo

AI crawlers haven't learned to play nice with websites

News photo

AI crawlers haven't learned to play nice with websites