Get the latest tech news

Here’s Proof You Can Train an AI Model Without Slurping Copyrighted Content


OpenAI claimed it's "impossible" to build good AI models without using copyrighted data. An “ethically created” large language model and a giant AI dataset of public domain text suggest otherwise.

A group of researchers backed by the French government have released what is thought to be the largest AI training dataset composed entirely of text that is in the public domain. He founded the nonprofit in January 2024 after quitting his executive role at image generation startup Stability AI because he disagreed with its policy of scraping content without permission. On Wednesday, researchers released what they claim is the largest available AI dataset for language models composed purely of public domain content.

Get the Android app

Or read this on Wired

Read more on:

Photo of Proof

Proof

Photo of AI model

AI model

Photo of copyrighted content

copyrighted content

Related news:

News photo

Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up

News photo

Anthropic releases Claude 3 Haiku, an AI model built for speed and affordability

News photo

Reddit Will Now Use an AI Model To Fight Harassment