Is AI Running Out of Training Data?
fliptop writes:
The meteoric rise of artificial intelligence but it's facing a shortage of training data:
"We've already run out of data," Neema Raphael, Goldman Sachs' chief data officer and head of data engineering, said on the bank's "Exchanges" podcast published on Tuesday.
Raphael said that this shortage may already be influencing how new AI systems are built.
He pointed to China's DeepSeek as an example, saying one hypothesis for its purported development costs came from training on the outputs of existing models rather than entirely new data.
[...] With the web tapped out, developers are turning to synthetic data - machine-generated text, images, and code. That approach offers limitless supply, but also risks overwhelming models with low-quality output or AI slop.
However, Raphael said he doesn't think the lack of fresh data will be a massive constraint, in part because companies are sitting on untapped reserves of information.
Rick Beato talked about [15:29 --JE] how he broke ChatGPT with a simple question and exposed the gaps in AI's "knowledge" that are filled with synthetic data.
Related: The Real (Economic) AI Apocalypse is Nigh
Read more of this story at SoylentNews.