Get the latest tech news

AI Firms Say They Can't Respect Copyright. But A Nonprofit's Researchers Just Built a Copyright-Respecting Dataset


Is copyrighted material a requirement for training AI? asks the Washington Post. That's what top AI companies are arguing, and "Few AI developers have tried the more ethical route — until now. "A group of more than two dozen AI researchers have found that they could build a massive eight-t...

They tested the dataset quality by using it to train a 7 billion parameter language model, which performed about as well as comparable industry efforts, such as Llama 2-7B, which Meta released in 2023." "This isn't a thing where you can just scale up the resources that you have available" like access to more computer chips and a fancy web scraper, said Stella Biderman [executive director of the nonprofit research institute Eleuther AI]. The group's initiative also builds on recent efforts to develop more ethical, but still useful, datasets, such as FineWeb from Hugging Face, the open-source repository for machine learning...

Get the Android app

Or read this on Slashdot

Read more on:

Photo of copyright

copyright

Photo of Firms

Firms

Photo of researchers

researchers

Related news:

News photo

Top Researchers Leave Intel To Build Startup With 'The Biggest, Baddest CPU'

News photo

Researchers find a way to make the HIV virus visible within white blood cells

News photo

Top researchers leave Intel to build startup with 'the biggest, baddest CPU'