Get the latest tech news

OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books


A new paper [PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The nonprofit organization, co-founded by O'Reilly Media CEO Tim O'Reilly himself, used a method called DE-COP to detect copyrighted conten...

A new paper[PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The technique, also known as a "membership inference attack," tests whether a model can reliably distinguish human-authored texts from paraphrased versions. "GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors, which include O'Reilly, economist Ilan Strauss, and AI researcher Sruly Rosenblat.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of OpenAI

OpenAI

Photo of training

training

Photo of books

books

Related news:

News photo

Researchers suggest OpenAI trained AI models on paywalled O’Reilly books

News photo

Sam Altman says that OpenAI’s capacity issues will cause product delays

News photo

OpenAI's built-in image generator for ChatGPT is now available to free users