Get the latest tech news
AI Lab PleIAs Releases Fully Open Dataset, as AMD, Ai2 Release Open AI Models
French private AI lab PleIAs "is committed to training LLMs in the open," they write in a blog post at Mozilla.org. "This means not only releasing our models but also being open about every aspect, from the training data to the training code. We define 'open' strictly: all data must be both accessi...
"As developers are responding to pressures from new regulations like the EU AI Act, Common Corpus goes beyond compliance by making our entire permissibly licensed dataset freely available on HuggingFace, with detailed documentation of every data source. — Diverse: consisting of scientific articles, government and legal documents, code, and cultural heritage data, including books and newspapers The Common Pile currently in preparation under the coordination of Eleuther is built around the same principle of using permissible content in English language and, unsurprisingly, there were many opportunities for collaborations and shared efforts.
Or read this on Slashdot