Get the latest tech news

AI Crawlers Haven't Learned To Play Nice With Websites

SourceHut, an open-source-friendly git-hosting service, says web crawlers for AI companies are slowing down services through their excessive demands for data. From a report: "SourceHut continues to face disruptions due to aggressive LLM crawlers," the biz reported Monday on its status page. "We are ...

We have deployed a number of mitigations which are keeping the problem contained for now. SourceHut said it had deployed Nepenthes, a tar pit to catch web crawlers that scrape data primarily for training large language models, and noted that doing so might degrade access to some web pages for users. "We have unilaterally blocked several cloud providers, including GCP [Google Cloud] and [Microsoft] Azure, for the high volumes of bot traffic originating from their networks," the biz said, advising administrators of services that integrate with SourceHut to get in touch to arrange an exception to the blocking.

Get the Android app

Or read this on Slashdot