Story 2015-01-04 2WNE Spam Filtering

Spam Filtering

by
in pipedot on (#2WNE)
Recently, Soylent News discussed adding more labels to the moderation system. Although opinions on "Disagree" and "Factually Incorrect" may still be varied, nearly everyone supported the addition of a "Spam" label.

For Pipedot, we've gone ahead and added the later. Moderating a comment as "Spam" will decrease its score by one and flag it for further review by an editor. This way, normal users can greatly help the editors identify junk comments.

Once an editor marks a comment as spam, the message will be "hidden" one step deeper than the normal "Hide Threshold" slider setting. However, comments are never deleted. If you want to continue to see all comments, including the spam, click the "Show Junk Comments" checkbox on your profile settings page. Similar to the current blue (new) and gray (seen) rendering, the title bar of junk comments will be colored red to easily differentiate them from the good stuff.
Reply 11 comments

Direct Link (Score: 2, Informative)

by bryan@pipedot.org on 2015-01-04 09:12 (#2WNF)

Directly linking to the comment will also show the comment, regardless of its junk status. Example: #2VAK

Re: Direct Link (Score: 2, Insightful)

by Anonymous Coward on 2015-01-05 09:26 (#2WNT)

Not having direct links was one of the key reasons I loathed /.Beta

Good! (Score: 1)

by nightsky30@pipedot.org on 2015-01-04 13:18 (#2WNG)

I just received a notification the other day for a post I had made a while ago...Upon checking the message, it was a spam comment. I am very grateful for this added functionality. Thank you.

Editor Question (Score: 1)

by evilviper@pipedot.org on 2015-01-04 22:08 (#2WNK)

What's your recommendation to editors on using, or not, the "Ban IP" option for spam posts?

Here's as good a place as any...

Re: Editor Question (Score: 1)

by bryan@pipedot.org on 2015-01-05 04:29 (#2WNQ)

Click the ban IP button it if it's one of those spammer bots. Otherwise, it'll just keep posting. That poor poll article I linked above had over 1000 spam messages all from the same french bot network.

Of course, the IP ban just prevents anonymous posts from that address. Registered users can still post through it.

Re: Editor Question (Score: 2, Insightful)

by zocalo@pipedot.org on 2015-01-05 07:51 (#2WNR)

Or the team could be more proactive on the backend. Many of the bots (or low-rent workers in 3rd world sweatshops, it's hard to tell these days) that stuff forums and submission queues seem to follow a fairly standard template so a few well crafted regexp's combined with a tool like Fail2Ban feeding the IP blacklist might nail a lot of the low hanging fruit before anyone even gets to see it.

Re: Editor Question (Score: 1)

by bryan@pipedot.org on 2015-01-05 08:15 (#2WNS)

I've looked into preemptive bans using existing spam databases (see http://www.stopforumspam.com/usage for an example) that use either a REST API call or a DNS lookup. However, with the current spam load I think the reactive approach is sufficient for now.

Re: Editor Question (Score: 2, Informative)

by evilviper@pipedot.org on 2015-01-05 22:47 (#2WNX)

seem to follow a fairly standard template so a few well crafted regexp's combined with a tool like Fail2Ban feeding the IP blacklist might nail a lot of the low hanging fruit
Doesn't sound like you've seen the kind of spam |. has been getting flooded with... It's paragraph after paragraph of random nonsense words. No pattern to it, but instead quite intentionally very random. Only commonality is that they had links in there, somewhere.

eg: http://pipedot.org/comment/2WL7

I tend to find it annoying when sites hold comments in a moderation queue, so I wouldn't like to see that happening here for every comment that happens to have a link in it... Or ones that just happen to mention "viagra".

Re: Editor Question (Score: 1)

by zocalo@pipedot.org on 2015-01-06 09:22 (#2WP0)

The sample posted earlier was the only one I'd ever seen, so I was quite surprised about the scale of the problem. Having it spammed into old threads would explain that though, which is possibly one reason why Slashdot archives older discussions. You're right about the pain of having stuff dropped into a submission queue though, and simply blocking common spam terms like "viagra" and the like is obviously going to give many false positives on a site that might discuss them, and will probably have them used in humourous comments elsewhere.

Getting back to the regexps, it's hard to say what (if anything) would work for Pipedot without a good overview of the crap being submitted, but one general technique that does seem like it would work well for typical forum spam (including your example) is to trigger off excessive use of certain punctuation marks, particularly in subjects - commas and hyphens seem well liked by many forum spammers; the one in your example put four in there. Ideally you'd probably also want to have a requirement that multiple rules match before a post goes into the moderation queue, or even a basic scoring system like SpamAssassin et al use, but based on the comments above that's probably overkill - at least at present. Ultimately though it's still an arms race, and the spammers will adapt as soon as they realise they are being blocked; sometimes you just have to go for the easy stuff and accept that the rest might need manual handling later.

Re: Editor Question (Score: 1)

by beldin@pipedot.org on 2015-01-30 22:57 (#2WV6)

<blockquote>Ultimately though it's still an arms race, and the spammers will adapt...</blockquote>
<a href="http://xkcd.com/810/">We're on a mission</a>.