Article 6DNG9 Now You Can Block OpenAI's Web Crawler

Now You Can Block OpenAI's Web Crawler

by
msmash
from Slashdot on (#6DNG9)
OpenAI now lets you block its web crawler from scraping your site to help train GPT models. From a report: OpenAI said website operators can specifically disallow its GPTBot crawler on their site's Robots.txt file or block its IP address. "Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies," OpenAI said in the blog post. For sources that don't fit the excluded criteria, "allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety." Blocking the GPTBot may be the first step in OpenAI allowing internet users to opt out of having their data used for training its large language models. It follows some early attempts at creating a flag that would exclude content from training, like a "NoAI" tag conceived by DeviantArt last year. It does not retroactively remove content previously scraped from a site from ChatGPT's training data.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments