Get the latest tech news

OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole from Us


OpenAI shocked that an AI company would train on someone else's data without permission or compensation.

In its motion to dismiss in court, OpenAI wrote “it has long been clear that the non-consumptive use of copyrighted material (like large language model training) is protected by fair use.” But additionally, part of OpenAI’s argument in the New York Times case is that the only way to make a generalist large language model that performs well is by sucking up gigantic amounts of data. As Sacks mentioned, “distillation” is an established principle in artificial intelligence research, and it’s something that is done all the time to refine and improve the accuracy of smaller large language models.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of OpenAI

OpenAI

Photo of data openai

data openai

Photo of furious deepseek

furious deepseek

Related news:

News photo

Chinese firms ‘distilling’ US AI models to create rival products, warns OpenAI

News photo

OpenAI Says It Has Evidence DeepSeek Used Its Model To Train Competitor

News photo

China’s cheap, open AI model DeepSeek thrills scientists — DeepSeek-R1 performs reasoning tasks at the same level as OpenAI’s o1, and is open for researchers to examine