As the "arms race" between spammers and spam filterers keeps ratcheting up, mail delays are likely to worsen as e-mail scanning becomes more complex.
As a result, some companies could be forced to invest in more robust hardware to keep mail flowing at acceptable speeds, researchers from HP Labs in Bristol told the 2004 Usenix Annual Technical Conference in Boston.
But there is a less expensive way to speed legitimate messages through the system: by classifying them as probable "good" or "junk" mail before they are sent to be scanned, according to a paper presented Monday afternoon titled "E-mail Prioritisation: Reducing Delays on Legitimate Mail Caused By Junk Mail."
After analysing weeks of incoming e-mail, researchers discovered that servers tend to be "faithful. In other words, if a server sent a good message before, it's likely to send a good one next time; If it sent junk before, chances are high that it's sending junk again. "New" servers that have no history of sending mail to your system before are probably sending spam.
Servers can detect the sending IP address from a message header, before the full message is scanned for viruses or spam content and sent for delivery, so those likely to be deemed "good" can go to the head of the queue for processing.
By keeping data on just 10 prior messages per e-mail-sending server and setting a threshold of at least 50% good messages from a server, the researchers were able to correctly predict junk mail 95% of the time and good messages 74% of the time.
While this is not accurate enough for a spam filter, said HP's Dan Twining, "it's good enough for what we want to do" - assign messages for high- or low-priority delivery. The "junk" messages aren't spiked; rather, they're sent to the end of the queue for processing.
HP researchers in Bristol found minor delays for good messages of perhaps two to three minutes at high loads, while all messages were delayed for more than 10 minutes without the preprocessing. In the real world, e-mail was delayed for four hours during the worst of the Sobig worm attacks.
The paper was one of several presented on issues surrounding the topic "Swimming in a Sea of Data," with several others focusing on ways to maximise storage efficiency by detecting redundancies in stored data.
Sharon Machlis writes for Computerworld