Article 6WBBA OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books

OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books

by
msmash
from Slashdot on (#6WBBA)
A new paper [PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The nonprofit organization, co-founded by O'Reilly Media CEO Tim O'Reilly himself, used a method called DE-COP to detect copyrighted content in language model training data. Researchers analyzed 13,962 paragraph excerpts from 34 O'Reilly books, finding that GPT-4o "recognized" significantly more paywalled content than older models like GPT-3.5 Turbo. The technique, also known as a "membership inference attack," tests whether a model can reliably distinguish human-authored texts from paraphrased versions. "GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors, which include O'Reilly, economist Ilan Strauss, and AI researcher Sruly Rosenblat.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments