The AI Isn't Scraping Data, Just Correlations.
looorg writes:
The scraping defence. They are not scraping content for their AI models. They are just looking for statistical correlations to their models.
Earlier this year, several authors sued NVIDIA over alleged copyright infringement. The class action lawsuit alleged that the company's AI models were trained on copyrighted works and specifically mentioned Books3 data. Since this happened without permission, the rightsholders demand compensation.
The lawsuit was followed up by a near-identical case a few weeks later, and NVIDIA plans to challenge both in court by denying the copyright infringement allegations.
In its initial response, filed a few weeks ago, NVIDIA did not deny that it used the Books3 dataset. Like many other AI companies, it believes that the use of copyrighted data for AI training is a prime example of fair use; especially when the output of the model doesn't reproduce copyrighted works.
The authors clearly have a different take. They allege that NVIDIA willingly copied an archive of pirated books to train its commercial AI model, and are demanding damages for direct copyright infringement.
[...] NVIDIA also shared its early outlook on the case. The company believes that AI companies should be allowed to use copyrighted books to train their AI models, as these books are made up of "uncopyrightable facts and ideas" that are already in the public domain.
The argument may seem surprising at first; the authors own copyrights and as far they're concerned, use of pirated copies leads to liability as a direct infringer. However, NVIDIA goes on to explain that their AI models don't see these works that way.
AI training doesn't involve any book reading skills, or even a basic understanding of a storyline. Instead, it simply measures statistical correlations and adds these to the model.
Read more of this story at SoylentNews.