Comment 2WV6 Re: Editor Question

Story

Spam Filtering

Preview

Editor Question (Score: 1)

by evilviper@pipedot.org on 2015-01-04 22:08 (#2WNK)

What's your recommendation to editors on using, or not, the "Ban IP" option for spam posts?

Here's as good a place as any...

Re: Editor Question (Score: 2, Insightful)

by zocalo@pipedot.org on 2015-01-05 07:51 (#2WNR)

Or the team could be more proactive on the backend. Many of the bots (or low-rent workers in 3rd world sweatshops, it's hard to tell these days) that stuff forums and submission queues seem to follow a fairly standard template so a few well crafted regexp's combined with a tool like Fail2Ban feeding the IP blacklist might nail a lot of the low hanging fruit before anyone even gets to see it.

Re: Editor Question (Score: 2, Informative)

by evilviper@pipedot.org on 2015-01-05 22:42 (#2WNX)

seem to follow a fairly standard template so a few well crafted regexp's combined with a tool like Fail2Ban feeding the IP blacklist might nail a lot of the low hanging fruit
Doesn't sound like you've seen the kind of spam |. has been getting flooded with... It's paragraph after paragraph of random nonsense words. No pattern to it, but instead quite intentionally very random. Only commonality is that they had links in there, somewhere.

eg: http://pipedot.org/comment/2WL7

I tend to find it annoying when sites hold comments in a moderation queue, so I wouldn't like to see that happening here for every comment that happens to have a link in it... Or ones that just happen to mention "viagra".

Re: Editor Question (Score: 1)

by zocalo@pipedot.org on 2015-01-06 09:22 (#2WP0)

The sample posted earlier was the only one I'd ever seen, so I was quite surprised about the scale of the problem. Having it spammed into old threads would explain that though, which is possibly one reason why Slashdot archives older discussions. You're right about the pain of having stuff dropped into a submission queue though, and simply blocking common spam terms like "viagra" and the like is obviously going to give many false positives on a site that might discuss them, and will probably have them used in humourous comments elsewhere.

Getting back to the regexps, it's hard to say what (if anything) would work for Pipedot without a good overview of the crap being submitted, but one general technique that does seem like it would work well for typical forum spam (including your example) is to trigger off excessive use of certain punctuation marks, particularly in subjects - commas and hyphens seem well liked by many forum spammers; the one in your example put four in there. Ideally you'd probably also want to have a requirement that multiple rules match before a post goes into the moderation queue, or even a basic scoring system like SpamAssassin et al use, but based on the comments above that's probably overkill - at least at present. Ultimately though it's still an arms race, and the spammers will adapt as soon as they realise they are being blocked; sometimes you just have to go for the easy stuff and accept that the rest might need manual handling later.

Re: Editor Question (Score: 1)

by beldin@pipedot.org on 2015-01-30 22:57 (#2WV6)

<blockquote>Ultimately though it's still an arms race, and the spammers will adapt...</blockquote>
<a href="http://xkcd.com/810/">We're on a mission</a>.

Junk Status

Not marked as junk