Get the latest tech news

Harvard and Google to release 1 million public-domain books as AI training dataset


AI training data has a big price tag, one best-suited for deep-pocketed tech firms. This is why Harvard University plans to release a dataset that

AI training data has a big price tag, one best-suited for deep-pocketed tech firms. This is why Harvard University plans to release a dataset that includes in the region of 1 million public-domain books, spanning genres, languages, and authors including Dickens, Dante, and Shakespeare, which are no longer copyright-protected due to their age. Harvard first teased the Institutional Data Initiative(IDI) back in March, outlining its plans to create a “trusted conduit for legal data for AI.” However, not much has been heard from it until its formal launch today, which came with confirmation that the IDI includes financial backing from Microsoft and OpenAI.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of Google

Google

Photo of Harvard

Harvard

Photo of domain books

domain books

Related news:

News photo

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft

News photo

Meta, TikTok and Google to be forced to pay for Australian news

News photo

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft