Get the latest tech news

Zuckerberg appeared to know Llama trained on Libgen

Newly unsealed documents show how Meta used LibGen, a pirated library of ebooks, to train its Llama 3 chatbot.

The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram. The same email lays out the legal exposures and potential negative media attention that could follow if “external parties” deduce that the LibGen trove formed part of Llama’s training data: “Copyright and IP is top of mind for legislators around the world, including in the US and EU,” the document states. Other revealing messages show Meta’s AI research team and executives discussing best methods for obtaining the LibGen data set besides directly torrenting it, or downloading via peer-to-peer file sharing, from the company’s IP addresses.

Get the Android app

Or read this on Hacker News