Stephen King, Zadie Smith and Rachel Cusk's Pirated Works Used To Train AI
Zadie Smith, Stephen King, Rachel Cusk and Elena Ferrante are among thousands of authors whose pirated works have been used to train artificial intelligence tools, a story in The Atlantic has revealed. The Guardian: More than 170,000 titles were fed into models run by companies including Meta and Bloomberg, according to an analysis of "Books3" -- the dataset harnessed by the firms to build their AI tools. Books3 was used to train Meta's LLaMA, one of a number of large language models -- the best-known of which is OpenAI's ChatGPT -- that can generate content based on patterns identified in sample texts. The dataset was also used to train Bloomberg's BloombergGPT, EleutherAI's GPT-J and it is "likely" it has been used in other AI models. The titles contained in Books3 are roughly one-third fiction and two-thirds nonfiction, and the majority were published within the last two decades. Along with Smith, King, Cusk and Ferrante's writing, copyrighted works in the dataset include 33 books by Margaret Atwood, at least nine by Haruki Murakami, nine by bell hooks, seven by Jonathan Franzen, five by Jennifer Egan and five by David Grann. Books by George Saunders, Junot DAaz, Michael Pollan, Rebecca Solnit and Jon Krakauer also feature, as well as 102 pulp novels by Scientology founder L Ron Hubbard and 90 books by pastor John MacArthur. The titles span large and small publishers including more than 30,000 published by Penguin Random House, 14,000 by HarperCollins, 7,000 by Macmillan, 1,800 by Oxford University Press and 600 by Verso.
Read more of this story at Slashdot.