Article 6Z3K0 Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

by
Thom Holwerda
from OSnews on (#6Z3K0)

We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website's preferences. We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their sourceASNsto hide their crawling activity, as well as ignoring - or sometimes failing to even fetch -robots.txtfiles.

The Internet as we have known it for the past three decades israpidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity's observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.

The CloudFlare Blog

Never forget they destroyed Aaron Swartz's life - literally - for downloading a few JSTOR articles.

External Content
Source RSS or Atom Feed
Feed Location http://www.osnews.com/files/recent.xml
Feed Title OSnews
Feed Link https://www.osnews.com/
Reply 0 comments