Get the latest tech news

Nepenthes is a tarpit to catch AI web crawlers

This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse. Find a source of text in whatever language you prefer; there's lots of research corpuses out there, or possibly pull in some very long Wikipedia articles, maybe grab some books from Project Gutenberg, the Unix fortune file, it really doesn't matter at all. persist_stats: A path to write a JSON file to, that allows statistics to survive across crashes/restarts, etc seed_file: Specifies location of persistent unique instance identifier.

Get the Android app

Or read this on Hacker News