Get the latest tech news

Zuckerberg appeared to know Llama trained on Libgen


Newly unsealed documents show how Meta used LibGen, a pirated library of ebooks, to train its Llama 3 chatbot.

The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram. The same email lays out the legal exposures and potential negative media attention that could follow if “external parties” deduce that the LibGen trove formed part of Llama’s training data: “Copyright and IP is top of mind for legislators around the world, including in the US and EU,” the document states. Other revealing messages show Meta’s AI research team and executives discussing best methods for obtaining the LibGen data set besides directly torrenting it, or downloading via peer-to-peer file sharing, from the company’s IP addresses.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Zuckerberg

Zuckerberg

Photo of libgen

libgen

Photo of Llama

Llama

Related news:

News photo

A Who’s Who at Peter Thiel’s Trump Party: Zuckerberg, Adelson and More

News photo

As Zuckerberg Goes Around Whining About Biden, He Made Sure To First Get His New Approach Approved By Trump

News photo

Donald Trump Has Mark Zuckerberg By the Balls. A TikTok ban is a massive prize that Zuckerberg has been laying the groundwork on for years. Will Trump let him have it?