Kansas Reflector Mostly Admits That Meta’s Blocking Of Their Site Wasn’t Deliberate
Not every bad mistake is evil. Not every poor decision is deliberate. Especially in these more automated times. Sometimes, machines just make mistakes, and it's about time we came to terms with that simple fact.
Last week, we wrote about how, while Meta may be a horrible awful company that you should not trust, there was no evidence suggesting that its blocking of the local news site, the Kansas Reflector, soon after it published a mildly negative article about Meta, was in any way deliberate.
As we pointed out, false positives happen all the time with automated blocking tools, where classifiers mistakenly decide a site or a page (or an email) is problematic. And that's just how this works. If you want fewer false positives, then you end up with fewer false negatives. And that would mean more actually dangerous or problematic content (phishing sites, malware, etc.) get through. At some point, you simply have to decide what types of errors are more important to stop and tweak the systems accordingly.
In general, it's probably better to get more false positives than false negatives. It's ridiculously annoying and problematic for those who are the victims of such mistakes. But, in general, you'd rather have fewer actual scams and malware getting through. And, that absolutely sucks for sites caught in the crossfire. Hell, last year, Microsoft Bing and DuckDuckGo banned all Techdirt links for a good five months or so. There was nothing I could do about it. At least I knew that it was likely just yet another false positive, because such false positives happen all the time.
I also knew that it was likely that there would never be a good explanation for what happened (Microsoft and DuckDuckGo refused to comment). Because, I also understand that the companies running these systems don't have full visibility into what happened either. Some people think this is a condemnation of the system, but I don't think it is. Classifier systems take a very large number of signals, and then decide whether that large combination of signals suggest a problem site or an acceptable one. And the thresholds and signals can (and do) change all the time.
Still, people who got mad at what I said last week kept insisting that (1) it must be deliberate, and (2) that Meta had to give a full and clear explanation of how this happened. I found both such propositions dubious. The first one for all the reasons above, and the second one because I know that it's often just not possible to tell. Hell, on a much smaller scale, this is how our own spam filter works in the comments here at Techdirt. It takes in a bunch of signals and decides whether or not something is spam. And sometimes it makes mistakes. Sometimes it flags content that isn't spam. Sometimes it lets through content that is. In most cases, I have no idea why. It's just that when all the signals are weighted, that's what's spit out.
And so, it's of little surprise that the Kansas Reflector is now more or less admitting what I suggested was likely last week. They are admitting that it was just Meta's automated detector (though they make it sound scarier by calling it AI") that made a bad call, and that even Meta probably couldn't explain why it happened:
Facebook's unrefined artificial intelligence misclassified a Kansas Reflector article about climate change as a security risk, and in a cascade of failures blocked the domains of news sites that published the article, according to technology experts interviewed for this story and Facebook's public statements.
The assessment is consistent with an internal review by States Newsroom, the parent organization of Kansas Reflector, which faults Facebook for the shortcomings of its AI and the lack of accountability for its mistake.
It isn't clear why Facebook's AI determined the structure or content of the article to be a threat, and experts said Facebook may not actually know what attributes caused the misfire.
Basically, exactly what I suggested was likely what happened (and which got a bunch of people mad at me). The Kansas Reflector story about it is a bit misleading because it keeps referring to the automated systems as AI" (which is a stretch) and also suggests that all this shows that Meta is somehow not sophisticated here, quoting the ACLU's Daniel Kahn Gillmor:
That's just not their core competency," Gillmor said. At some level, you can see Facebook as somebody who's gotten out ahead of their skis. Facebook originally was a hookup app for college students back in the day, and all of a sudden we're now asking it to help us sort fact from fiction."
But, I think that's basically wrong. Meta may not be particularly good at many things, and the company may have very screwed up incentives, but fundamentally, its basic trust & safety operation is absolutely one of the company's core competencies. It's bad, because every company is bad at this, but Meta's content moderation tools are much more sophisticated than most others.
Part of the issue is simply that the scale of content it reviews is so large that even if it has a very, very small error rate, many, many sites will get falsely flagged (either as a false negative or a false positive). You can argue that the answer to this is less scale, but that raises other questions, especially in a world where it appears that people all over the world want to be able to connect with other people all over the world.
But, at the very least, it's nice that the Kansas Reflector has published this article explaining that it's unlikely that, even if it wanted to, Meta could explain what happened here.
Sagar Samtani, director of the Kelley School of Business' Data Science and Artificial Intelligence Lab at Indiana University, said it is common for this kind of technology to produce false positives and false negatives.
He said Facebook is going through a learning process," trying to evaluate how people across the globe might view different types of content and shield the platform from bad actors.
Facebook is just trying to learn what would be appropriate, suitable content," Samtani said. So in that process, there is always going to be a whoops,' like, We shouldn't have done that.'"
And, he said, Facebook may not be able to say why its technology misclassified Kansas Reflector as a threat.
Sometimes it's actually very difficult for them to say something like that because sometimes the models aren't going to necessarily output exactly what the features are that may have tripped the alarm," Samtani said. That may be something that's not within their technical capability to do so."
It's not even that it's not within the technical capability" to do it, because that implies that if they just programmed it differently, it could tell you. Rather, there are so many different signals that it's weighing, that there's no real way to explain what triggered things. It could be a combination of the number of links, with the time it was posted, with how it was shared, to a possible vulnerability on the site, each weighted differently. But when combined, they all worked to trip the wire saying this site might be problematic."
Any one of those things by themselves might not matter and might not trip things, but somehow the combination might. And that's not at all easy to explain, especially when the signals, and the weights, and the thresholds are likely in constant flux.
Yes, this sucks for the Kansas Reflector. However, it seems like it got a lot more attention because of all of this. But it's the nature of content moderation these days that is unlikely to change. Every site has to use some form of automation, and that's always going to lead to mistakes of some sort or another. It's fine to call out these mistakes and even to make fun of Meta, but it helps to be realistic about what the cause is. This way, people won't overreact and suggest that this fairly typical automated mistake was actually a deliberate attempt to suppress speech critical of the company.