Five major publishing houses and bestselling author Scott Turow have filed a class action lawsuit against Meta and its chief executive Mark Zuckerberg, alleging the social media giant pirated millions of copyrighted works to train its artificial intelligence models.
The complaint, filed in the US District Court for the Southern District of New York, accuses Meta of wilful infringement of millions of textual works, including literature, educational material and scholarly articles, used to develop its Llama large language models.
Elsevier, Cengage, Hachette Book Group, Macmillan and McGraw Hill are the five publishers behind the action.
Authors published by the five firms include Turow, James Patterson, Donna Tartt and former US president Joe Biden, as well as at least two of this year's Pulitzer Prize winners, Yiyun Li and Amanda Vaill.
The lawsuit is striking in its direct targeting of Zuckerberg personally, not merely as a figurehead but as the individual alleged to have driven the infringement strategy.
The complaint claims Meta, at Zuckerberg's behest, downloaded unauthorised web scrapes of large swathes of the internet, including subscription-only content, and torrented protected books and journal articles from pirate sites such as LibGen and Anna's Archive.
After releasing Llama 1, Meta briefly considered entering licensing deals with major publishers and discussed increasing its dataset licensing budget to as much as $200 million between January and April 2023.
But in early April 2023, the company abruptly halted that licensing strategy after the question was escalated to Zuckerberg, the complaint alleges.
One Meta employee is quoted in the suit as describing the rationale: if the company licensed even a single book, it would undermine its ability to rely on a fair use defence.
The lawsuit alleges Zuckerberg and other Meta executives authorised the torrenting of more than 267 terabytes of pirated material, which the plaintiffs say is equivalent to hundreds of millions of publications and many times the size of the entire print collection of the Library of Congress.
The plaintiffs are seeking monetary damages and an injunction, including an order requiring Meta to destroy all infringing copies in its possession.
Meta pushed back on the claims, with a spokesperson saying that courts have found training AI on copyrighted material can qualify as fair use, adding that it would fight the lawsuit aggressively.
The case arrives against the backdrop of a rapidly evolving legal landscape for the AI industry and copyrighted content.
In June 2025, a federal judge rejected a claim brought by 13 authors, including Sarah Silverman, that Meta had violated their copyrights by training its AI model on their books.
Related reading
- Apple risks $38bn Indian antitrust fine after refusing to hand over financial records
- The AI boom is eating Apple's memory supply and there is no quick fix
- Apple veteran Stan Ng retires after 31 years as executive reshuffle gathers pace
Anthropic, the AI company behind the Claude chatbot, became the first major AI developer to settle such a case last year, agreeing to pay $1.5 billion to resolve a class action that could have cost it billions more in damages.
The new suit's emphasis on Zuckerberg's personal role, and its allegation that Meta deliberately chose piracy over licensing, marks an escalation in the publishing industry's confrontation with big tech over AI training data.
The recap
- Publishers sue Meta and Mark Zuckerberg for alleged copyright infringement.
- Financial Times described the alleged infringement as "massive" in headline.
- Full article is behind a paywall; trial offer lasts 4 weeks.