Making It More Difficult to Shill Recommender Systems
fliptop writes:
Places like Amazon, Facebook, and Twitter are swimming in data, but their problem is that a lot of it is untrustworthy and shilled. But you don't need to use all the data. Toss big data happily, anything suspicious at all, false positives galore accidentally marking new accounts or borderline accounts as shills when deciding what to input to the recommender algorithms. Who cares if you do?
Lately I've been thinking about recommender algorithms and how they go wrong. I keep hitting examples of people arguing that we should ban the fewest accounts possible when thinking about what accounts are used by recommender systems. Why? Or why not the opposite? What's wrong with using the fewest accounts you can without degrading the perceived quality of the recommendations?
The reason this matters is that recommender systems these days are struggling with shilling. Companies are playing whack-a-mole with bad actors who just create new accounts or find new shills every time they're whacked because it's so profitable -- like free advertising -- to create fake crowds that manipulate the algorithms. Propagandists and scammers are loving it and winning. It's easy and lucrative for them.
So what's wrong with taking the opposite strategy, only using the most reliable accounts? As a thought experiment, let's say you rank order accounts by your confidence they are human, independent, not shilling, and trustworthy. Then go down the list of accounts, using their behavior data until the recommendations stop improving at a noticeable level (being careful about cold start and the long tail). Then stop. Don't use the rest. Why not do that? It'd vastly increase costs for adversaries. And it wouldn't change the perceived quality of recommendations because you've made sure it wouldn't.
Previously:
Amazon Still Hasn't Fixed Its Problem with Bait-and-Switch Reviews
Amazon's Top UK Reviewers Appear to Profit From Fake 5-Star Posts
Read more of this story at SoylentNews.