Get the latest tech news

MLCommons and Hugging Face team up to release massive speech dataset for AI research


The nonprofit AI safety org MLCommons has teamed up with Hugging Face to release a public domain data set of speech recordings.

Because many of Archive.org’s contributors are English-speaking — and American — almost all of the recordings in Unsupervised People’s Speech are in American-accented English, per the readme on the official project page. While MLCommons says that all recordings in the data set are public domain or available under Creative Commons licenses, there’s the possibility mistakes were made. According to an MIT analysis, hundreds of publicly available AI training data sets lack licensing information and contain errors.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of AI research

AI research

Photo of Hugging Face

Hugging Face

Photo of mlcommons

mlcommons

Related news:

News photo

Hugging Face makes it easier for devs to run AI models on third-party clouds

News photo

Hugging Face claims its new AI models are the smallest of their kind

News photo

Hugging Face settles suit with AI startup FriendliAI, which had accused it of patent infringement