Blocking comment spam without captchas

Spam block imageImagine you want to leave a comment on a blog. Next, imagine you have to solve one of those captchas — a jumbled image of letters and numbers — to prove you’re human. Now imagine that you can’t see. You have to solve an audio captcha instead, such as the ones below. All you have to do is work out what the sequence of numbers is in each one:


MP3 | OGG


MP3 | OGG


MP3 | OGG

Rather tricky, huh? And these are genuine examples, not edited or downsampled.

Like most people, I needed a way to block spam on my WordPress blog but I refuse to use captchas. Spam is not my readers’ problem and as you can see, captchas are an extra hassle for everyone. Hard for people with good eyesight and even harder for people with poor or no eyesight (which includes all of us at some point in the future). Clearly we need to find a better alternative…

Question-and-answer captcha

These are of the form “what is two plus seven?” or “in which season do leaves fall?”. With text like this a blind person can hear it with software called a screen reader, which is good, but it has to be in a language they understand, which is not good. My blog is in English and Japanese so this a problem. There could be multiple answers, too; leaves can fall in the autumn or in the fall.

An extra form field hidden with CSS

This involves a field that should stay empty, possibly with a message saying “leave empty”. The idea is that spam bots will automatically fill in every field they come across but humans won’t. CSS can be used to hide the field from humans but it’s still “visible” to screen reader users or those with user styles applied (or none at all). Also, I tried this out for a while but in practice it wasn’t very effective — the spam bots are too smart.

Third-party spam-blocking service

Akismet is the most widely-used but there are others. They work by sending each comment to a remote server where it is analysed and classified as spam or genuine. This method seems to be effective but you have to register and possibly pay depending on your spam-blocking needs. It’s also not ideal in an environment where comments should be private.

IP address blacklists

Whether using your own list or somebody else’s, it’s easy to block comments from IP addresses that are known to send spam. I don’t like this for two reasons, though:

  • Several users can share a single IP address, for example through a VPN, or spammers could be sending comments from a machine they’ve secretly gained access to. I don’t want to penalise innocent users.
  • Looking through some of my spam comments I couldn’t see any from the same IP address so I doubt this approach would be very effective.

The answer?

Then I came across some spam bot research and a nice way to block them by Ned Batchelder (and probably others too). As most spam comments are sent rapidly from remote servers, we can add a unique token to our comment form and then check it’s still there when a comment is posted. I tried this and sure enough it blocked about two-thirds of spam — still room for improvement but it’s a good starting point. After much tweaking and experimenting I found that further analysis of a comment could catch most of the remaining spam, such as the delay between page load and posting, the ratio of text to links and the similarity of comment text to blog post text. Finally I’m satisfied. Out of the last 100 spam comments on this blog, 98 of them were detected and blocked by the filter. It’s not quite perfect and it may become less effective over time but I’m happy to make that compromise and keep my readers happy.

I’ve put this together into a spam-blocking plugin for WordPress and you can see the source code here. It’s designed to be invisible for users and maintenance-free for site owners. If you try it and like it, please give it a nice review in the WordPress directory so I can get that warm fuzzy feeling!

UPDATE

Since writing this I’ve seen spam increase again despite my plugin. I was going to try to update it but I’ve since discovered (thanks Otto) another WordPress plugin, Cookies for Comments, which is much better and works using a similar concept. This is my recommendation for all WordPress users.

10 thoughts on “Blocking comment spam without captchas

  1. Moritz

    Interessting Article. I think about to implement the solution with the hidden field for my website in the future.
    Blacklisting all the IP’s is to much work in my opinion.
    The audio solution ist nice but what about people without speakers? For example in an Internet-Café?

    Reply
  2. littleguy

    Good ideas, clean code!

    Once you combine it with Akismet and optionally a simple captcha, it should be very effective.

    Will try this out.

    Reply
    1. Daniel Post author

      Thanks for the compliment!
      I personally want to avoid using any kind of captcha but it’s certainly possible to combine this with Akismet or similar service.

      Reply
  3. Julia

    Have anyone tried Keypic solution? It doesn’t use CAPTCHA, no any other kind of test for users. Bots are detected and blocked automatically. Would be glad to have your feedback guys!

    Reply
    1. Daniel Post author

      Interesting. I’d not heard of Keypic before but looking at their site and demos it seems a bit similar to Akismet. I’d prefer not to have to rely on a third-party web service if possible but it looks like it could be useful in some cases.

      Reply
      1. Julia

        Thanks, Daniel. Well, akismet is also good but it makes you pay as soon as you start getting much traffic. And Keypic is free of charge. There are positive comments around about it.

        Reply
    1. Daniel Post author

      Thanks for the link – this is another new one for me. It looks like they don’t do text analysis but I’ll take a look at their code. The time display sounds interesting.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>