Bots, Spiders, Scrapers and AI - Connections to SN
Over recent weeks we have been experiencing connections from a large number of bots, spiders and scrapers. Some are the expected ones (Microsoft, Google, Amazon etc) and these tend to rate limit their requests and cause us little problem.
Others appear to be AI driven scrapers and they can result in tying up a large percentage of the site's resources. For the most part they ignore robots.txt or when we return code 429. While they are individually only an annoyance their activity can affect the speed at which the site can respond to members attempts to view a page or leave a comment. They have contributed to some of the 404 or 503 (Backend Fetch Failed) that you might have experienced recently. A small number of bots isn't a problem, but if many bots are querying the site at the same time then they can affect the speed at which the site can respond to your comment or request.
Software has been developed to block such abusive sites for a short period. In the majority of cases this will be invisible to you as users other than to hopefully improve the responsiveness of the site.
However, it is possible that sometimes there might be a false positive and you may encounter difficulties in connecting to the site. If you do experience connection problems please inform us immediately either by email or on IRC. Neither of those apply filters to connections; the short temporary blocks only apply to the site itself. We will have to contact you by email to ascertain your IP address so that we can lift any block that may have been incorrectly applied. Please do not publish an IP address in either a comment or on IRC.
If you are using a VPN or Tor it might be advisable to try another routing to circumvent any temporary block that might be affecting your connection.
Read more of this story at SoylentNews.