Revealed: The Authors Whose Pirated Books Are Powering Generative AI (1)

Aug 22, 2023
The tech giant Meta, previously known as Facebook, is facing allegations of copyright infringement. Authors Sarah Silverman, Richard Kadrey, and Christopher Golden claim that Meta used their books to train LLaMA, a large language model akin to OpenAI's GPT-4.
Let’s play this out.  The case makes it’s way to the Supreme Court, who rules in favor of the book authors.  What happens to the all the existing models that used that data to train? AI is generally a black box, so it will be hard to pinpoint which model scraped what. Virtually every major LLM would be impacted. Additionally will be hard to “unpull the trigger here” especially seeing as how so many models are now open sourced and downloadable locally.  There may be a judgment day in the court system soon, so we’re keeping 👀 on this one.
Let’s play this out. The case makes it’s way to the Supreme Court, who rules in favor of the book authors. What happens to the all the existing models that used that data to train? AI is generally a black box, so it will be hard to pinpoint which model scraped what. Virtually every major LLM would be impacted. Additionally will be hard to “unpull the trigger here” especially seeing as how so many models are now open sourced and downloadable locally. There may be a judgment day in the court system soon, so we’re keeping 👀 on this one.
A recent investigation revealed that LLaMA's training data indeed contains over 170,000 books, including works by renowned authors like Stephen King, Michael Pollan, and Margaret Atwood. This dataset, known as "Books3," has also been utilized by other tech entities, including Bloomberg. The origin of Books3 traces back to a collection of pirated books from Bibliotik. While some argue that using copyrighted material for AI training falls under "fair use," the debate intensifies as the tech and publishing worlds clash over intellectual property rights. In a twist of irony, while Meta defends its own intellectual property fiercely, it's accused of benefiting from the unauthorized use of others'.