Get the latest tech news
MLCommons and Hugging Face team up to release massive speech dataset for AI research
The nonprofit AI safety org MLCommons has teamed up with Hugging Face to release a public domain data set of speech recordings.
Because many of Archive.org’s contributors are English-speaking — and American — almost all of the recordings in Unsupervised People’s Speech are in American-accented English, per the readme on the official project page. While MLCommons says that all recordings in the data set are public domain or available under Creative Commons licenses, there’s the possibility mistakes were made. According to an MIT analysis, hundreds of publicly available AI training data sets lack licensing information and contain errors.
Or read this on TechCrunch