Get the latest tech news
OpenAI destroyed a trove of books used to train AI models
New legal filings reveal OpenAI deleted 2 massive datasets that it used to train a powerful AI model. The employees who built the datasets are gone.
The unsealed letter from OpenAI's lawyers, which is labeled "highly confidential - attorneys' eyes only", says that the use of books1 and books2 for model training was discontinued in late 2021 and that the datasets were deleted in mid-2022 due to their non-use. Newly unsealed documents in the class action lawsuit brought by the Authors Guild against OpenAI reveal that the startup deleted two huge datasets, named "books1" and "books2," that had been used to train its GPT-3 AI model. Lawyers for the Authors Guild said in court filings that the datasets likely contained "more than 100,000 published books" and are central to its allegations that OpenAI used copyrighted materials to train AI models.
Or read this on r/technology