Get the latest tech news

Harvard Makes 1 Million Books Available to Train AI Models


The dataset includes books that are in the public domain and no longer protected by copyright.

Publishers including the Wall Street Journal and the New York Times have sued OpenAI and competitor Perplexity for ingesting their data without permission. At the same time as AI companies are running out of new content to utilize, commonly used web sources that are already included in training sets have quickly begun restricting access. Elon Musk’s X has an exclusive arrangement with his other company, xAI, to give its models access to the social network’s content for training and retrieval of current information.

Get the Android app

Or read this on r/technology

Read more on:

Photo of books

books

Photo of Models

Models

Photo of Harvard

Harvard

Related news:

News photo

US firm beats Microsoft, Harvard with 50 entangled logical qubits in quantum computer | Records are tumbling in months as companies race to build fault tolerant quantum computers.

News photo

Harvard and Google to release 1 million public-domain books as AI training dataset

News photo

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft