Article 71GAD While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data

While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data

by
EditorDavid
from Slashdot on (#71GAD)
From the personal blog of interface expert Bruce Ediger:Early in March 2025, I noticed that a web crawler with a useragent string of meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) was hitting my blog's machine at an unreasonable rate. I followed the URL and discovered this is what Meta uses to gather premium,human-generated content to train its LLMs. I found the rate ofrequests to be annoying. I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had"meta-externalagent" in its user agent string with the contentsof a bork.php generated file... This workedbrilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and31, 2025...After about 3 months, I got scared that Meta's insatiableconsumption of Super Great Pages about condiments, underwear andcirca 2010 C-List celebs would start costing me money. So I switchedto giving "meta-externalagent" a 404 status code. I decided tosee how long it would take one of the highest valued companies in theworld to decide to go away.The answer is 5 months.

twitter_icon_large.pngfacebook_icon_large.png

Read more of this story at Slashdot.

External Content
Source RSS or Atom Feed
Feed Location https://rss.slashdot.org/Slashdot/slashdotMain
Feed Title Slashdot
Feed Link https://slashdot.org/
Feed Copyright Copyright Slashdot Media. All Rights Reserved.
Reply 0 comments