James Seng's Blog : Blog Archive : Problem with blacklist

October 18th, 2003

» Blogging

Jay Allen, the author of MT-blacklist, commented in a discussion at Q Daily News:

While I will agree that many blacklist implementations and models are flawed, I still haven’t heard an valid criticism specifically of MT-Blacklist’s implementation (other than a couple of bugs which will be ironed out in the next version), but would be happy to hear some and adapt the program as necessary to best serve the needs of the community.

I take a quick look at MT-blacklist. The whole logic to determine if a comment or ping is spam or not comes down to this:

foreach $deny (@blacklisted_strings) {
  if ($str =~ m#$deny#i) 
    return $config->{logDenials} ? (1:$deny) : 1;
  }
}

Translation: Do a case insensitive check on every blacklisted words and see if it appears anywhere in the comments or pings. If it is, then it is spam.

So, suppose I have “porn” as a blacklisted word (which I am almost certainly will), then a discussion of “Should we ban porn in Singapore?” would be impossible to be carried out.

The simple mindedness of blacklist logic is the problem, whether it is IP blacklist or content blacklist. Bayesian, on the other hand, analysis the whole content and give you a probability whether it is spam. Not perfect but at least it is not 1 or 0. Life isnt binary…

For further discussion, see Paul Graham note on Bayesian vs Blacklist.

ps: Please don’t get me wrong. I think Jay should be commented for his effort to fight comment spam. But that does not mean I agree with the notion of blacklist.

This entry was posted on Saturday, October 18th, 2003 at 11:13 pm and is filed under Blogging. You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

Comments are closed.

October 18th, 2003

Problem with blacklist

» Blogging

Tags