Cloudflare Rolls Out Feature For Blocking AI Companies' Web Scrapers
Cloudflare today unveiled a new feature part of its content delivery network (CDN) that prevents AI developers from scraping content on the web. According to Cloudflare, the feature is available for both the free and paid tiers of its service. SiliconANGLE reports: The feature uses AI to detect automated content extraction attempts. According to Cloudflare, its software can spot bots that scrape content for LLM training projects even when they attempt to avoid detection. "Sadly, we've observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent," Cloudflare engineers wrote in a blog post today. "We've monitored this activity over time, and we're proud to say that our global machine learning model has always recognized this activity as a bot." One of the crawlers that Cloudflare managed to detect is a bot that collects content for Perplexity AI Inc., a well-funded search engine startup. Last month, Wired reported that the manner in which the bot scrapes websites makes its requests appear as regular user traffic. As a result, website operators have struggled to block Perplexity AI from using their content. Cloudflare assigns every website visit that its platform processes a score of 1 to 99. The lower the number, the greater the likelihood that the request was generated by a bot. According to the company, requests made by the bot that collects content for Perplexity AI consistently receive a score under 30. "When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint," Cloudflare's engineers detailed. "For every fingerprint we see, we use Cloudflare's network, which sees over 57 million requests per second on average, to understand how much we should trust this fingerprint." Cloudflare will update the feature over time to address changes in AI scraping bots' technical fingerprints and the emergence of new crawlers. As part of the initiative, the company is rolling out a tool that will enable website operators to report any new bots they may encounter.
Read more of this story at Slashdot.