Meta ‘discussed buying publisher Simon & Schuster to train AI’ | Books | The Guardian

Audio shared with the New York Times appears to record executives discussing purchase of the US books giant to feed into its large language models

Staff at technology company Meta discussed buying publishing house Simon & Schuster last year in order to procure books to train the company’s artificial intelligence tools, it has been reported.

According to recordings of internal meetings shared with the New York Times, managers, lawyers and engineers at Meta met on a near-daily basis between March and April 2023 to discuss how it could get hold of more data to train AI models. From the recordings, which were shared by an employee of the Mark Zuckerberg-owned company that owns Facebook and Instagram, the New York Times found that staff had discussed buying Simon & Schuster and some had debated paying $10 per book for the licensing rights to new titles.

Simon & Schuster is one of the English-speaking world’s major book publishing houses and is part of what is referred to as the “Big Five”, along with Penguin Random House, HarperCollins, Hachette and Macmillan. Simon & Schuster’s authors include Stephen King, Colleen Hoover and Bob Woodward.

In March 2020, Paramount Global, the parent company of Simon & Schuster, announced its intention to sell the publisher. After a much-criticised planned merger with Penguin Random House was blocked by US courts, Simon & Schuster was eventually sold to private equity firm KKR in August 2023.

According to the recordings, Ahmad Al-Dahle, Meta’s vice president of generative AI, told executives that the company had used almost every book, poem and essay written in English available on the internet to train models, so was looking for new sources of training material.

Employees said they had used these text sources without permission and talked about using more, even if that would result in lawsuits. When a lawyer flagged “ethical” concerns about using intellectual property, they were met with silence.

Discover new books and learn more about your favourite authors with our expert reviews, interviews and news stories. Literary delights delivered direct to you

Staff also discussed having hired contractors in Africa to aggregate summaries of fiction and non-fiction texts, which contained copyrighted content “because we have no way of not collecting that”, said one manager.

Maria A Pallante, president of the Association of American Publishers, does not believe that Simon & Schuster would have agreed to such a sale. “The fact that Meta sought to purchase one of the most important publishing houses in American history in order to ingest its venerable catalogue for AI profits is puzzling even for Big Tech,” she said. “Did Meta plan to trample the primary mission of Simon & Schuster, and its contractual partnerships with authors, by sheer power?”

In November, California federal judge Vince Chhabria dismissed part of a copyright lawsuit brought by comedian Sarah Silverman and other authors against Meta over the use of copyrighted books in training its AI system LLaMA. Chhabria cast doubt on the argument that the models’ outputs resemble the authors’ works.