Article 6P8XS YouTube creators surprised to find Apple and others trained AI on their videos

YouTube creators surprised to find Apple and others trained AI on their videos

by
Samuel Axon
from Ars Technica - All content on (#6P8XS)
Screenshot-2024-07-16-153158-800x440.png

Enlarge / YouTuber Marques Brownlee discusses iOS 18 in a new video. This specific video wasn't part of the large dataset that was used to train AI models, but many of his others were. (credit: Marques Brownlee)

AI models at Apple, Salesforce, Anthropic, and other major technology players were trained on tens of thousands of YouTube videos without the creators' consent and potentially in violation of YouTube's terms, according to a new report appearing in both Proof News and Wired.

The companies trained their models in part by using "the Pile," a collection by nonprofit EleutherAI that was put together as a way to offer a useful dataset to individuals or companies that don't have the resources to compete with Big Tech, though it has also since been used by those bigger companies.

The Pile includes books, Wikipedia articles, and much more. That includes YouTube captions collected by YouTube's captions API, scraped from 173,536 YouTube videos across more than 48,000 channels. That includes videos from big YouTubers like MrBeast, PewDiePie, and popular tech commentator Marques Brownlee. On X, Brownlee called out Apple's usage of the dataset, but acknowledged that assigning blame is complex when Apple did not collect the data itself. He wrote:

Read 13 remaining paragraphs | Comments

External Content
Source RSS or Atom Feed
Feed Location http://feeds.arstechnica.com/arstechnica/index
Feed Title Ars Technica - All content
Feed Link https://arstechnica.com/
Reply 0 comments