Get the latest tech news

Inside Meta’s race to beat OpenAI: “We need to learn how to build frontier and win this race”

A trove of newly released documents reveals Meta’s plans to use book piracy site LibGen to train its AI models.

An undated email from Meta director of product Sony Theakanath, sent to VP of AI research Joelle Pineau, weighed whether to use LibGen internally only, for benchmarks included in a blog post, or to create a model trained on the site. The court documents stem from a class action lawsuit that author Richard Kadrey, comedian Sarah Silverman, and others filed against Meta, accusing it of using illegally obtained copyrighted content to train its AI models in violation of intellectual property laws. Bloomberg reported that frontier labs like OpenAI and Google have been paying digital content creators between $1 and $4 per minute for their unused video footage through a third-party in order to train LLMs (both of those companies have competing AI video-generation products).

Get the Android app

Or read this on The Verge