Herbal Vi@gra, no-prescription pills, bodily-enhancement miracles, mortgage scams, cable descramblers, non-existent lonely women fishing for a date with absolutely anyone, fake bank login reminders, designer watches and handbags and various junk you don't want. Sound familiar?
These past few weeks, we've noted a dramatic increase in spam delivered to customer and staff mailboxes on our shared hosting system. Our Support Team has heard from many customers recently who reasonably assumed our spam filtering technology was simply turned off. It was a problem, and it got noticed.
So, we began an incremental process of increasing the sensitivity of some of the tests which help determine whether an individual email is spam or not. Each test assigns points, and the total points represents a confidence level that the message is indeed spam. For more on how this works, see this post.
It's important to make changes gradually, because dramatic changes to the scoring system can result in "false positives", or legitimate email that gets flagged as spam, which we try to avoid for obvious reasons.
To measure the effects of our changes, we worked this month to track statistics on the amount of mail being handled, the percentage of it which was flagged as spam, and the level of "spamminess" of messages delivered. The overall dataset we examined was over four million deliveries from July 2009.
Spam Surge
The first thing we've learned is that while the amount of low-scoring spam certainly increased, it wasn't just a failure of the filtering software. There was a generalized surge in overall messages received -- probably due to the avalanche of new spam!
Here's a visualization of this phenomenon that Erik made for us today:
There's one bar per day in July, and taller bars represent a higher volume of incoming mail. The darker inner bar represents the proportion of overall mail that day which was flagged as spam. July 31st is shorter because we pulled the stats about halfway through the day, but you can see that there was as much spam detected in the first half of the day than all mail combined from some days earlier in the month.
You can also see that the overall volume of mail increased dramatically, and the amount flagged as spam has increased near the end of the month.
How Spammy?
We wanted to not only know that more mail was being flagged as spam, we also wanted to know just how "spammy" it was. Here are two graphs that present that data.
First, a daily graph of average spam scores. For reference, a score of 8 or above is almost certainly spam.
The humps earlier in the month are weekends -- days on which there's a lot less legitimate email, so spam constitutes a higher ratio of deliveries.
Next, here's a representation of the distribution of spam scores at several points in the month. The first pie chart represents the 12th through 17th of July. The second represents the 26th through 31st of July, and the final pie chart is just for July 31st. We made our most recent filtering improvements on July 30th.
You can see that a few weeks ago, spam scoring 8 points or higher constituted about one third of all messages. Today, that proportion is roughly two thirds. That means that a lot of previously low-scoring spam is now being scored higher.
How You Can Help
Spam apparently isn't going away; in fact, it is increasing. We believe that mailbox owners should have as much control as feasible over deliveries to their mailbox, and that's why we built the OnSite Spam Protection tool.
In our previous review of spam stats, we came to the realization that a lot of our customers have the default "Moderate" setting for spam filtering enabled. Turning that up to "High" or even "Very High" is a safe bet for most people. This and other strategies are discussed in our spam FAQ.
We think the changes we made this month have improved the spam situation quite a bit. Do you agree? Please, let us know here or here.
-JM