Post Comment - Pipedot

Re: Editor Question (Score: 1)

by zocalo@pipedot.org on 2015-01-06 09:22 (#2WP0)

The sample posted earlier was the only one I'd ever seen, so I was quite surprised about the scale of the problem. Having it spammed into old threads would explain that though, which is possibly one reason why Slashdot archives older discussions. You're right about the pain of having stuff dropped into a submission queue though, and simply blocking common spam terms like "viagra" and the like is obviously going to give many false positives on a site that might discuss them, and will probably have them used in humourous comments elsewhere.

Getting back to the regexps, it's hard to say what (if anything) would work for Pipedot without a good overview of the crap being submitted, but one general technique that does seem like it would work well for typical forum spam (including your example) is to trigger off excessive use of certain punctuation marks, particularly in subjects - commas and hyphens seem well liked by many forum spammers; the one in your example put four in there. Ideally you'd probably also want to have a requirement that multiple rules match before a post goes into the moderation queue, or even a basic scoring system like SpamAssassin et al use, but based on the comments above that's probably overkill - at least at present. Ultimately though it's still an arms race, and the spammers will adapt as soon as they realise they are being blocked; sometimes you just have to go for the easy stuff and accept that the rest might need manual handling later.