Get the latest tech news
This Is How Meta AI Staffers Deemed More Than 7 Million Books to Have No “Economic Value” | As more than a dozen lawsuits churn ahead, newly unsealed case files reveal the company’s stance: The pirated books Meta used to train its AI are individually worthless.
As more than a dozen lawsuits churn ahead, newly unsealed case files reveal the company’s stance: The pirated books Meta used to train its AI, including ones by Beverly Cleary, Jacqueline Woodson, and Andrew Sean Greer, are individually worthless.
Meta did conduct preliminary discussions with publishers about potential licensing fees, but they received numbers that, according to court documents, to the company’s mind, were “wildly off.” In the transcript of a taped deposition that has been made public, the defense describes potential negotiations over licensing material as “a bit of a song and dance” that “takes a bunch of their time; it takes our time,” and says that because of book publishing rights structures, “absent fair use, Meta would have to initiate individualized negotiations with millions of authors,” which would entail “identifying individual books and their authors; determining how to contact them; ascertaining whether they own rights clear of encumbrances,” etc. “They intend to make a lot of money off the entities they have reared and fattened on my words, so they could at least buy me a coffee.” (For the answer to this question, she need look no further than an email by a director of engineering at Meta, Sergey Edunov, in which he explains that “if we license one single book, we won’t be able to lean into fair use strategy.”) The other concern is that “authors can’t enforce things like, you can’t allow outputs that include my work.” Some AI companies say that they’re putting filters on their tools that prevent the large language model from duplicating work verbatim—it’s one of the arguments at the heart of Meta’s fair use case.
Or read this on r/technology