Get the latest tech news
AI crawlers need to be more respectful
We talk a bit about the AI crawler abuse we are seeing at Read the Docs, and warn that this behavior is not sustainable.
Bots repeatedly download large files hundreds of times daily, partially from bugs in their crawlers, with traffic coming from many IP addresses without rate or bandwidth limiting. Given that our Community site is only for hosting open source projects, AWS and Cloudflare do give us sponsored plans, but we only have a limited number of credits each year. But because many of these files are not downloaded often (and they're large), the cache is usually expired and the requests hit our origin servers directly, causing substantial bandwidth charges.
Or read this on Hacker News