Get the latest tech news

MLCommons and Hugging Face team up to release massive speech dataset for AI research

The nonprofit AI safety org MLCommons has teamed up with Hugging Face to release a public domain data set of speech recordings.

Because many of Archive.org’s contributors are English-speaking — and American — almost all of the recordings in Unsupervised People’s Speech are in American-accented English, per the readme on the official project page. While MLCommons says that all recordings in the data set are public domain or available under Creative Commons licenses, there’s the possibility mistakes were made. According to an MIT analysis, hundreds of publicly available AI training data sets lack licensing information and contain errors.

Get the Android app

Or read this on TechCrunch