Archive for December, 2008

How Bad Behavior handles false positives

December 20th, 2008 by Michael Hampton

I’m seeing increased chatter lately about Bad Behavior and concerns about so-called false positives, where a user whose comment the site owner wants is prevented from commenting. Since Bad Behavior handles this issue in a completely different manner than other solutions, I think it’s time it was addressed in detail.

Obviously no anti-spam solution is going to last long if it prevents legitimate comments, yet all of them do have false positives, to some extent, or else they are almost entirely ineffective. Every email user has had the experience of seeing expected email messages in the Junk folder, and every WordPress user has plowed through the hundreds of spams caught by Akismet or Defensio to find the one or two legitimate comments they somehow flagged. It happens.

The reason it happens is that spammers are trying to get their spam past our defenses, with varying degrees of success, by making it appear legitimate, or pulling various tricks to try to confuse the spam filters. The converse of this is that every so often, a legitimate message will look like spam. It has become, unfortunately, an arms race between spammers and the anti-spammers, with you, the hapless user, caught in the middle.

This is just one reason that Bad Behavior doesn’t look at the content of a message when trying to decide whether it is spam. Instead, I recognize that spammers attempt to trick and confuse, and so Bad Behavior looks at the metadata which accompanies each HTTP request, the headers, IP address, etc. Each web browser has certain identifying characteristics which can be spotted in the HTTP headers, and spammers, whose bots pretend to be actual web browsers, often don’t get all of these characteristics right. Or they relay their spam through open proxy servers or botnets which leave their own identifying characteristics on the HTTP request. Similar characteristics apply for legitimate bots, such as search engines.

By comparing the HTTP request to what is expected of a legitimate request, Bad Behavior can not only block spammers without analyzing the message content, it can in many cases block the spammers’ robots from scraping the content of your site, or block malicious attacks against your application software (e.g. WordPress) — even if the attack is previously unknown.

Problems arise, though, when theory meets reality. During Bad Behavior’s over three year history, there have been many instances in which a request appeared illegitimate for what turned out to be innocuous reasons, such as improperly implemented web browsers or proxies. In all of these cases, I have either updated Bad Behavior, worked with the author of the software to help them resolve the problem, or both. For obvious reasons I strongly prefer the latter option; it’s my opinion that software which doesn’t conform to the basic Internet standards (RFCs) should be updated to do so, and its users should demand better from those vendors or seek other solutions.

In cases where the interaction between Bad Behavior and a user’s web browser or proxy is at issue, my goal is zero false positives, even at the risk of allowing some spam to pass through. This is why I have always recommended that no anti-spam solution be used alone, and that Bad Behavior be paired with another solution which does analyze message content, such as Akismet.

There is another class of “false” positives, though, and that is those where the user being blocked represents a potential threat. This includes computers which are stuffed full of viruses, botnets and other malware, and those from which spam is already pouring forth, with or without the user’s knowledge. Bad Behavior blocks these, even though the user may be someone from whom the site owner wants to receive a comment, for the safety of both the site and the blocked user.

In one egregious case, a user visited their favorite blog, which was running Bad Behavior. But because their computer was part of a botnet sending comment spam, it them started sending spam to the blog the user just visited! Bad Behavior blocked this, but it incidentally also blocked the user who happened to be reading the very same blog his computer was trying to deliver spam to. Later I found out that this person refused to use any sort of anti-virus software, even though it could be downloaded for free, and did not care if his computer was sending spam. Bad Behavior will continue to block this sort of recalcitrant user, even if some would consider it a “false” positive.

In these circumstances, Bad Behavior delivers a message to the user explaining that they were temporarily blocked and giving directions on how the user can resolve the problem. In most cases it involves merely removing the malware from the computer. In the rare case that the directions given do not resolve the problem, the user also gets contact information for the site owner for further help. If this happens, the site owner can review the Bad Behavior logs to determine what might be going on, and then forward the report to me. I go over these reports and help the site owner and original user resolve the problem.

Sometimes this results in a change to Bad Behavior, sometimes a third party program is fixed, and sometimes the user’s computer gets some advanced anti-virus help. Even with all the thousands of people using Bad Behavior, and the countless millions visiting those sites, I only get approximately one such report a week.

By way of example, the last two such reports I received were interesting. One was from a major U.S. city newspaper which uses Bad Behavior to protect its blogs. The user who was blocked turned out to have had a third-party software component on their computer which, while legitimate in and of itself, is popular with many malware authors, but the user was still being blocked even after the malware and the third-party component were removed. The component had left traces of itself behind, causing the user to continue to be blocked. I gave the newspaper instructions on how to remove these traces, and also the third-party component was updated to no longer leave these traces behind.

The most recent report concerns a small business router appliance from a major vendor. Due to a design error in the router’s firmware, Bad Behavior would block accesses through this device if any of the router’s web filtering features were enabled. I was able to provide the router user with a workaround, and the vendor has opened an internal ticket to resolve the issue. A fix is expected in a future firmware release for the affected routers.

Compared to how widely Bad Behavior is used, these reports are quite rare, as even users who are blocked for reasons of having viruses on their computer are almost always able to resolve the issue by themselves. Nevertheless, I take every blocked user seriously and I put forth whatever effort is needed to make sure that Bad Behavior protects your site without needlessly inconveniencing your users.