« Previous | Main | Next »

CAPTCHA and BBC iD

Post categories:

Rowun Giles | 06:48 UK time, Wednesday, 6 October 2010

Hi I'm Rowun. I work in the UX&D Prototyping team.

CAPTCHAs are a big issue for websites. Using them has the potential to exclude disabled and non-disabled users alike. Our users often tell us that they don't want to see CAPTCHAs on BBC Online and they will be pleased to see that when they use BBC iD, our single sign on service, that this is still the case. I've decided to write this post to explain why this decision was made.

Captcha image

Captcha image from Wikipedia

Late in 2009 Judith Garman, Pekka Toppi, Lucy Dodd and I began looking into CAPTCHA technology for BBC iD and how it might affect users. We researched into cracking, implementing and the future of CAPTCHAs. We performed user tests to document the experience of using our services with CAPTCHA and tested various solutions that could be suitable for our needs.

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It's a technology that assists in discerning between human and non-human users with the goal of preventing unwanted usage of services (e.g. posting ads or spam) by non-humans.

You've probably already seen one on a registration or comment page. It's often an image of distorted text that must be typed into a box next to it. However, it could be a logic based puzzle that has to be solved or an image of an animal that needs to be selected based on a question. There are many different types of CAPTCHAs and many different variations of those types. We needed to find out which, if any, were acceptably accessible for the BBC and were a good fit for the requirements set out by BBC iD.

We started the research by looking at the CAPTCHAs most commonly available and potentially the best suited to BBC iD. The CAPTCHAs covered were distorted text and distorted images, 3D, logic and sound. We needed the research finished before BBC iD launched and with enough time for the BBC iD team to implement a solution if opted for.

We found that most image CAPTCHAs, including "select image type" and "select the one that is a..." could be cracked by existing software or would need a database of images so numerous to prevent logging as to be impractical. There are also the obvious accessibility issues such as vision impairment that needed to be taken into account for image CAPTCHAs. The accessibility issues and the need for constant database updates discounted this CAPTCHA type.

It appears that as a technology, 3D CAPTCHA is not mature enough. More information is needed over what the easiest models to interpret are, what is the optimal position of those models, what are the best textures and positioning of lighting. Many of these questions will be answered as the technology matures. 3D CAPTCHA has potential as it requires interpretation, life experience and spatial awareness. All things that software in the near future will continue to have difficulty with.

Next we looked at distorted text and logic puzzles. We recognised that not all distorted text was appropriate and we weren't sure about logic puzzles. Distorted text has an advantage over most other CAPTCHAs. There is community support for users with accessibility needs in the form of browser plug-ins and websites that can either decipher CAPTCHA text or send it to a human volunteer to decipher and send a result back. This is a double-edged sword, it shows that it can be cracked but with the secrecy around the plug-in technology, the need to register for access and submittal limits it is an acceptable compromise.

This mock up of Captcha for BBC iD was never used

This mock up of what Captcha might look like on BBC iD was never used

We settled on looking into CAPTCHAs that were distorted text and logic puzzles, we also tested sound features added to help with accessibility. After arranging a user testing session with a variety of users, with and without accessibility needs, we mocked up 3 types of pages with CAPTCHAs (a logic and two distorted words) and an audio component and 1 without.

The results were not unexpected. Many users did not know what a CAPTCHA was or understand why they were needed. Most users found them annoying. Visually impaired participants expected full accessibility from the BBC and felt it would affect our reputation to use them. Elderly users had issues with the distorted text. The logic puzzles were found to be odd and patronising. The audio was struggled with. Overall, extremely negative feelings were expressed towards CAPTCHA technology.

From a cracking standpoint, we found a single factor that negated all the advanced and expensive cracking software and the most advanced and resistant CAPTCHAs: Companies for hire whose business it is to crack CAPTCHAs with human operators.

The negative user experience that a CAPTCHA creates and the CAPTCHA cracking companies are two factors too great for us to ignore.

With all this in mind we have decided, at least for now, not to use CAPTCHA on BBC iD.

Rowun Giles is Junior Web Developer in UXD prototyping, BBC Future Media & Technology


Comments

  • 1.

    Hey Rowun,

    Thanks for your post, but I can't quite be believe you've actually run it.

    I'm about to leave my house and, by the way, I've left my front door unlocked....

    That's what you're telling us has happened to BBC iD. I agree CAPTCHA isn't perfect, and the plan for it at the BBC (as I understand it) was always only to show it to those who met some suspicious criteria - I won't, thankfully, divulge what they those criteria are, but they were the result of some excellent research by some very clever people at the BBC.

    However since you've said you're not using CAPTCHA anywhere in BBC iD - I'm about to write a script to completely spam your login system.

    It'll take me 12 lines of Python. And I won't be the only one.

    Please come to your senses, and use this technology sparingly, but where necessary.

  • 2.

    Simon — surely “we aren't using CAPTCHAs” is a piece of information that's pretty trivial for any would-be spammer to discover all by themselves?

    (And also, I can't recall — is e-mail address verification a required step in the sign-up process for an iD?)

  • 3.

    Rowun - just to say I think you've made the right decision. CAPTCHAs are magnets for spammers and are crackable. I run a website that had a guest book with a good CAPTCHA mechanism, and that was being hit up to 300 times a day by spambots. (I had a bit of code that preventing them going further into the system, but that's another matter.) And if my little website attracted 300 hits a day from them, the BBC site could get millions.

    Incidentally, I've always thought the simple mechanism that some bloggers use seems to have merit - here's what Martin Belam does for example on his comment submission section:

    "Alan Turing wouldn't be impressed with this crude test - but please put 'toothpaste' into this box to prove you are a human."

    Russ

  • 4.

    Mo, just what I was thinking. You just have to register to see that we don't use CAPTCHA. We don't require email address validation unless the service you are using requires it, however.

    The super-secret criteria to trigger CAPTCHA is a nice idea but at the end of the day the determined spammer will just employ one of these services to crack CAPTCHA and if it pops up accidentally for any genuine user it will just impair their registration journey (or maybe they won't finish registering at all).

    We already take steps to prevent spamming and automatic bulk form submission so it's not like the door is wide open. I'm pleased to see some of our research being published - very useful and just the sort of thing the BBC should be doing.

  • 5.

    Russ - I've always liked Martin's approach too - it does work well for a smaller site although I'm not entirely sure if it's necessary for smaller blogs to have CAPTCHAs any more when the anti-spam services are so good.

    Outside of work I look after three different websites of varying sizes all with comments and none of which employ any CAPTCHAs, Turing tests etc and very few bits of spam get through the net.

    Meanwhile one major website I tried about a year ago proved impossible for me to register with because, try as I might, I couldn't decipher their CAPTCHAs! True they had an audio backup but that required transcribing what seemed like a 40 word sentence! I can't remember what that site was but it's a safe assumption that I don't use it regularly.

  • 6.

    Andrew - you 'ad it easy, lad. We used 't dream of transcribin 40-word sentences to get in't some places. In my day, we 'ad ter submit three essays in't latin to get through pre-mod on't Radio 3 messageboards.

    Russ

  • 7.

    One trick that you missed: CAPTCHA using recognition of BBC media content. You've -got- a vast database of audio, video and image, which is familiar to your audience but not to general spammers, assuming they are separate. "This is an elephant appearing on [Blue Peter]. What happens next?" "Who is this man and why does he no longer present [Film 2010]?" "What did you just hear Melvyn Bragg / Neil Nunes / Patrick Moore saying?"

    Defence in depth means having multiple counter-measures, which don't have to be each completely effective. Attacks on CAPTCHA are made when either the prize is worth the effort, and/or when getting past -all- the defences doesn't have an excessive cost.

    Not that I'm -demanding- CAPTCHA to log in, but it can be fun. Or it can be impossibly difficult, which is quite annoying. At least you are usually allowed more than one CAPTCHA to try if you got the first one wrong.

  • 8.

    Have you considered reCAPTCHA? It uses the process of solving CAPTCHAs to digitise books, which with appropriate explanations could mitigate the chore. And it also seems to be in the spirit of public service and will even get those spammers doing something useful.

  • 9.

    Russ - I like the idea of Martin's honey pot. On smaller sites they'd work well to combat generic spam systems. On bigger sites with custom sign in systems, chances are a custom spam system will be written that takes the honey pot into account.

    Robert - We did consider using BBC content in the way that you suggest but ultimately decided against it. Many of our users may not be regular consumers of BBC content or perhaps have just immigrated to the UK and so they don't have that content association.

    tristanf - Unfortunately with reCAPTCHA the accessibility issues still remain.

  • 10.

    I believe captchas should be installed to serve a purpose other than just fighting spam. Like mentioned above, every captcha can be broken by spammers that have a small amount of skill, but I think the real reason to have them is to stop people that normally wouldn't... such as a pissed off user.

    ReCAPTCHA reads books which is a step in the right direction, but I've taken the approach on submission forms at narwhaler(work in progress) to make it fun. To make it fun, I've made my own word list that it pulls from. Right now there are just generic words in it like "lovely day" and "Sunday", but I'll be changing them eventually to phrases that will likely get someone to laugh.

    It's not brilliant by any standards, but I think it makes the well needed captchas slightly more tolerable, and possibly something to look forward to.

  • 11.

    In case you have problems with spam, I recommend fully accessible spam filtering: Sblam! (it's a BSD-licensed server-side bayesian filter).

    While no filter is 100% accurate, it can at least greatly help moderation (rejecting obvious spam outright, whitelisting benign messages and leaving rest to moderators).

    BBC iD registration is rather lenghty, but kudos for not using CAPTCHA!

  • 12.

    I've seen a couple of fake CAPTCHAs over the last year or two. Liked the idea, but not clear
    on the usability:

    http://number9.hellooperator.net/wp-content/uploads/2008/07/captcha.png

 

More from this blog...

Categories

These are some of the popular topics this blog covers.

BBC iD

Sign in

bbc.co.uk navigation

BBC © 2011

The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.