Get the latest tech news

Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content

Newly-announced catalog collects pre-2022 sources untouched by ChatGPT and AI contamination.

"The idea is to point to sources of text, images and video that were created prior to the explosion of AI-generated content," Graham-Cumming wrote on his blog last week. That casualty was wordfreq, a Python library created by researcher Robyn Speer that tracked word frequency usage across more than 40 languages by analyzing millions of sources, including Wikipedia, movie subtitles, news articles, and social media. For example, in 2020, I proposed creating a so-called "cryptographic ark"—a timestamped archive of pre-AI media that future historians could verify as authentic, collected before my then-arbitrary cutoff date of January 1, 2022.

Get the Android app

Or read this on ArsTechnica