Get the latest tech news

OpenAI’s models ‘memorized’ copyrighted content, new study suggests


A new study appears to lend credence to allegations that OpenAI trained at least some of its AI models on copyrighted content.

OpenAI is embroiled in suits brought by authors, programmers, and other rights-holders who accuse the company of using their works — books, codebases, and so on — to develop its models without permission. OpenAI has long claimed a fair use defense, but the plaintiffs in these cases argue that there isn’t a carve-out in U.S. copyright law for training data. Abhilasha Ravichander, a doctoral student at the University of Washington and a co-author of the study, told TechCrunch that the findings shed light on the “contentious data” models might have been trained on.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of OpenAI

OpenAI

Photo of Models

Models

Photo of new study

new study

Related news:

News photo

OpenAI says it’ll release o3 after all, delays GPT-5

News photo

OpenAI's $20 ChatGPT Plus is now free for students until the end of May

News photo

OpenAI's $20 ChatGPT Plus is now free for college students until the end of May