Article 700JN AI's Free Web Scraping Days May be Over, Thanks to This New Licensing Protocol

AI's Free Web Scraping Days May be Over, Thanks to This New Licensing Protocol

by
janrinok
from SoylentNews on (#700JN)

upstart writes:

AI's free web scraping days may be over, thanks to this new licensing protocol:

AI companies are capturing as much content as possible from websites while also extracting information. Now, several heavyweight publishers and tech companies -- Reddit, Yahoo, People, O'Reilly Media, Medium, and Ziff Davis (ZDNET's parent company) -- have developed a response: the Really Simple Licensing(RSL) standard.

You can think of RSL as Really Simple Syndication's(RSS) younger, tougher brother. While RSS is about syndication, getting your words, stories, and videos out onto the wider web, RSL says: "If you're an AI crawler gobbling up my content, you don't just get to eat for free anymore."

The idea behind RSL is brutally simple. Instead of the old robots.txt file -- which only said, "yes, you can crawl me," or "no, you can't," and which AI companies often ignore -- publishers can now add something new: machine-readable licensing terms.

Want an attribution? You can demand it. Want payment every time an AI crawler ingests your work, or even every time it spits out an answer powered by your article? Yep, there's a tag for that too.

This approach allows publishers to define whether their content is free to crawl, requires a subscription, or will cost "per inference," that is, every time ChatGPT, Gemini, or any other model uses content to generate a reply.

The key capabilities of RSL include:

  • A shared vocabulary that lets publishers define licensing and compensation terms, including free, attribution, pay-per-crawl, and pay-per-inference compensation.
  • An open protocol to automate content licensing and create internet-scale licensing ecosystems between content owners and AI companies.
  • Standardized, public catalogs of licensable content and datasets through RSS and Schema.org metadata.
  • An open protocol for encrypting digital assets to securely license non-public proprietary content, including paywalled articles, books, videos, and training datasets.
  • Supporting collective licensing via RSL Collective or any other RSL-compatible licensing server.

It's a clever fix for a complex problem. As Tim O'Reilly, the O'Reilly Media CEO and one of the RSL initiative's high-profile backers, said: "RSS was critical to the internet's evolution...but today, as AI systems absorb and repurpose that same content without permission or compensation, the rules need to evolve. RSL is that evolution."

O'Reilly's right. RSS helped the early web scale, whether blogs, news syndication, or podcasts. But today's web isn't just competing for human eyeballs. The web is now competing to supply the training and reasoning fuel for AI models that, so far, aren't exactly paying the bills for the sites they're built on.

Of course, tech is one thing; business is another. That's where the RSL Collective comes in. Modeled on music's ASCAP and BMI, the nonprofit is essentially a rights-management clearinghouse for publishers and creators. Join for free, pool your rights, and let the Collective negotiate with AI companies to ensure you're compensated.

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments