Get the latest tech news

Nepenthes is a tarpit to catch AI web crawlers


This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse. Find a source of text in whatever language you prefer; there's lots of research corpuses out there, or possibly pull in some very long Wikipedia articles, maybe grab some books from Project Gutenberg, the Unix fortune file, it really doesn't matter at all. persist_stats: A path to write a JSON file to, that allows statistics to survive across crashes/restarts, etc seed_file: Specifies location of persistent unique instance identifier.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of AI web crawlers

AI web crawlers

Photo of Nepenthes

Nepenthes

Photo of tarpit

tarpit

Related news:

News photo

As publishers block AI web crawlers, Direqt is building AI chatbots for the media industry