Get the latest tech news
Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back
The Wikimedia Foundation and Google's data science platform Kaggle are offering AI developers a dataset of information from Wikipedia they can freely use.
Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been having on its servers, leading to increased costs and slower load times for human users in some cases. The organization has teamed up with Kaggle, a data science platform, to offer up a beta release of a structured dataset in both English and French. Wikimedia Enterprise notes that the dataset includes "abstracts, short descriptions, infobox-style key-value data, image links and clearly segmented article sections."
Or read this on Endgadget