Article 6B0BK Fresh concerns raised over sources of training material for AI systems

Fresh concerns raised over sources of training material for AI systems

by
Alex Hern UK technology editor
from on (#6B0BK)

Investigations reveal limited efforts to clean' datasets of fascist, pirated and malicious material

Fresh fears have been raised about the training material used for some of the largest and most powerful artificial intelligence models, after several investigations exposed the fascist, pirated and malicious sources from which the data is harvested.

One such dataset is the Colossal Clean Crawled Corpus, or C4, assembled by Google from more than 15m websites and used to train both the search engine's LaMDA AI as well as Meta's GPT competitor, LLaMA.

Continue reading...
External Content
Source RSS or Atom Feed
Feed Location http://feeds.theguardian.com/theguardian/technology/rss
Feed Title
Feed Link http://feeds.theguardian.com/
Reply 0 comments