Get the latest tech news

Project Analyzing Human Language Usage Shuts Down Because 'Generative AI Has Polluted the Data'


The creator of an open source project that scraped the internet to determine the ever-changing popularity of different words in human language usage says that they are sunsetting the project because generative AI spam has poisoned the internet to a level where the project no longer has any utility. ...

404 Media: Wordfreq is a program that tracked the ever-changing ways people used more than 40 different languages by analyzing millions of sources across Wikipedia, movie and TV subtitles, news articles, books, websites, Twitter, and Reddit. The system could be used to analyze changing language habits as slang and popular culture changed and language evolved, and was a resource for academics who study such things. She said that open web scraping was an important part of the project's data sources and "now the web at large is full of slop generated by large language models, written by no one to communicate nothing.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of data

data

Photo of project

project

Photo of human language usage

human language usage

Related news:

News photo

The FCC is charging AT&T $1.46 for every person that got their data stolen

News photo

LinkedIn has stopped grabbing UK users’ data for AI

News photo

Miami Investment Firm 777 Says Ex-IT Head Stole Data for Lender’s Fraud Suit