Get the latest tech news

Harvard Makes 1 Million Books Available to Train AI Models

The dataset includes books that are in the public domain and no longer protected by copyright.

Publishers including the Wall Street Journal and the New York Times have sued OpenAI and competitor Perplexity for ingesting their data without permission. At the same time as AI companies are running out of new content to utilize, commonly used web sources that are already included in training sets have quickly begun restricting access. Elon Musk’s X has an exclusive arrangement with his other company, xAI, to give its models access to the social network’s content for training and retrieval of current information.

Get the Android app

Or read this on r/technology