Stopping the Spammers

With nearly one in two e-mails being junk, consumers could stop doing business electronically. Danny Bradbury finds out how...

With nearly one in two e-mails being junk, consumers could stop doing business electronically. Danny Bradbury finds out how spammers work and what IT departments can do to fight them.

Asimple invitation enticing people to a reception for a new DEC-20 machine from an engineer in 1978 got the spam ball rolling. Now, it is threatening to seriously hinder e-mail as a medium for commercial communications. E-mail filtering company Brightmail has reported that 46% of all mails that it encounters are spam. If nearly one in two e-mails received are junk, consumers could easily revert to other means of communication. Clearly, something has to be done.

The problem is defining what spam is. Most people with no commercial interest in e-mail marketing would probably define all mass e-mailers as spammers. Monica Seeley, founder of IT consultancy Mesmo and author of a book on using e-mail correctly, believes that any mail not directly helpful to doing a job is spam.

Perhaps predictably, mass mail companies have a different view. Loren McDonald, vice-president of marketing for MailLabs, which manages mass e-mail marketing campaigns, agrees with most anti-spam advocates that a company should have a person's permission before it e-mails them. But he argues that there are several types of mass e-mailer. The kind that obtains e-mail addresses without consent - perhaps by surfing websites or buying a CD of names - and uses them to send untargeted e-mail is clearly a spammer. But there are others who are simply unaware that you should have the recipient's permission before you e-mail them, he says, and may take e-mail addresses from trade directories to build up a base of contacts. A third group is legitimate, but their e-mails are so badly designed that they are dismissed as spam.

This group is the most interesting because it obtains opt-in consent almost by stealth. When you register your address on a website legitimately offering a product or service, a box asking if you want to receive related marketing information is already checked, and tucked away at the bottom of the form. More reputable companies these days will adopt a double opt-in system, requiring you to confirm your wishes by replying to an e-mail or accessing a URL sent to you in an e-mail after your initial registration.

Most spammers do not make any attempt to get opt-in permission from the recipient. They rely on volume for their commercial return, and they use a variety of techniques to stop people spotting where their mails are sent from, and to avoid anti-spam software.

Jim DiDominicus, chief information security officer at the New York Board of Trade, reveals that he recently set up a secure system for a group of Florida spammers to help them foil denial of service attacks from disgruntled spam recipients. "I got to look at their tools," he explains. "There are people out there that definitely have a better understanding of sendmail than most of us do and they are able to exploit that very well."

William Plante, director of worldwide security and brand protection at Symantec became interested in techniques used by spammers after he began to see pirate copies of his company's software being sold openly via spam mail. "At the worst time we were seeing tens of thousands of complaints a month," he says, adding that every copy surreptitiously purchased by Symantec was counterfeit. "It was a real threat to our business, and the threat has not gone," he says.

Getting e-mail addresses is the easiest part of the process for spammers, who can buy hundreds of thousands of them on CD for just a few dollars. These addresses are collected in a number of ways. Software robots surf for websites with e-mail addresses listed as "mailto:" hyperlinks and collate them into lists. Dictionary attacks are another popular method of generating addresses. A spammer will take a domain name and automatically generate likely prefixes for e-mail addresses in the hope that some will work.

Perhaps the most underhand way of acquiring live addresses is the false opt-out scam. Many spam mails will include a link that you can follow to get your name removed from the distribution list. In many cases they are genuine, but in some they are simply used to identify live mail addresses. This enables spammers to prioritise their e-mail targets.

Once they have obtained the addresses, open relays, e-mail servers that by default allow the throughput of third-party e-mails, are the most useful tools for spammers. Even today, people leave their SMTP e-mail relays open so that anyone can use them, instead of locking them down to a set of users. The Open Relay Database ( is a publicly available list of these relays, which systems administrators can use to identify culprits.

Address obfuscation is an important part of the equation when sending spam, especially in areas where commercial e-mailers are legally required to prove that the recipient has opted in. Alyn Hockey, director of research at anti-spam company Clearswift, explains that spammers use tools to fabricate e-mail headers that help hide their own addresses. "They just put in the various options to build the message and the client that sends the message does it all for them," he says. The spammer's real IP address will always be there somewhere, "but you could have started with half a dozen fake ones before you get to the legitimate one".

Spammers attempting to stay one step ahead in the war against unsolicited e-mail are now trying a new tactic: open proxies. Many home computers on broadband links are unprotected by firewalls, and even those that are behind firewalls can be infected by trojan programs. One recent trojan turns the host machine into an e-mail server that is then used to send spam e-mail, hiding the real sender's identity completely.

But why should spammers attempt to hide their addresses if they ultimately have to be contacted by potential customers? "What they're doing is contracting a direct mail company to send the job for them," says Chris Miller, group product manager for e-mail security at Symantec. That way they can abdicate responsibility for how the contracted company does the job, he says.

Fighting the spammers is becoming harder, but suppliers are rising to the challenge. Content scanning is a traditional way of blocking spam mail. Clearswift, with its Mimesweeper and Enterprisesuite products, looks for key words in e-mails and also uses wildcards, says Hockey. It also uses reverse address look-up to try and identify false mails. However, both of these techniques have their problems. Spammers are beginning to misspell subjects, using "secks" instead of "sex", for example. Even wildcard checks may not pick these up.

Clearswift employs another technique called fingerprinting. It uses decoy e-mail accounts as spam traps and then analyses the incoming mail, creating fingerprints of mails which it then lists as data files on its website. These can then be downloaded by users to update their own Clearswift server software. Brightmail, which offers anti-spam software to both enterprises and ISPs, uses a similar network of addresses which are analysed by a team of experts and used to produce anti-spam rules.

Real-time blackhole lists have long been an accepted way of fighting spam. ISPs or corporate customers using this approach on their e-mail servers check the originating relay server for an e-mail. If that server is listed for sending large amounts of spam, indicating that it is either unprotected or that the company running it is knowingly sending spam, then it will be blocked and messages will not be delivered.

The biggest problem with real-time blackhole lists is that the number of false positives - legitimate mails that do not make it through - are high. The alternative to this - whitelists, in which only domains trusted by the user get through - can make the problem worse, especially as spammers are beginning to spoof legitimate corporate mails in an attempt to sneak past the lists.

DiDominicus says whitelists and blacklists are far from ideal in a corporate environment. "Whitelists are not good for corporate use because you never know who is going to try and do business with you."

Anti-spam firm Mailkey uses a modified version of the whitelist approach which chief executive officer Tim Dean-Smith thinks will put an end to the spamming problem. "We let in spam based on what we think is legitimate rather than blocking what we think is illegitimate," he says.

The caveat is that if the program blocks a mail, it sends a reply back to the sender asking them to confirm that it is genuine. As most spams are automatically produced and replies will not reach the sender because the return addresses are false, a positive response usually guarantees a genuine sender, he says. The company will be releasing a corporate version of its consumer product in the next month.

Perhaps the most interesting and accurate anti-spam technique today, however, uses the group consensus technique, which takes the fingerprinting concept a stage further. Cloudmark, formed in 2000, uses the Vipul's Razor algorithm, developed by co-founder Vipul Ved Prakash who used it with friends to try and reduce his personal spam intake.

A bolt-in to Microsoft Outlook analyses incoming mails and checks them against a central server holding fingerprints of known spam e-mail. If a mail slips through, a "block" button lets the user manually classify the mail as spam, fingerprinting the mail and uploading it to the central server. At this point, the server automatically evaluates the mail based on certain criteria, including the user's past reliability when it comes to classifying spam.

One of the benefits of this system is that it cuts through the whole tangled mess of what is and is not spam. The program lets the user community - currently numbering over 400,000 - decide for itself. In the future, CEO Karl Jacob says Cloudmark will introduce a version of the system that gives users more options, using its consensus model to identify companies offering genuine opt-out models and offering a separate opt-out button within Outlook for them to unsubscribe from lists.

However, the group consensus model requires communication between the corporate software and its central server, which Jacob believes could present security concerns for corporate customers. So Cloudmark's Authority product uses yet another anti-spamming technique, involving a modified Bayesian algorithm. This is a mathematical theory using probability to make assessments.

One of the disadvantages of running an in-house server-based system like Authority or Enterprisesuite, as opposed to an outsourced, ISP-based service such as Brightmail, is that it presents a processing overhead. New York Board of Trade's DiDominicus says, "I think the outsourced services did a better job and the performance was better on our end."

As spammers continue to develop new techniques, anti-spamming software will also become more innovative. One of the great things about consensus computing is that it uses what Sun Microsystems calls the net effect - the idea that the power of a network increases with the number of people using it. Perhaps now, as spam mail reaches the point where it begins to represent a serious threat rather than a mere irritation, we have finally found a way to dismiss it once and for all.

Legislation in the pipeline   

Until recently, anti-spam legislation in Europe has been unsatisfactory to lobbyists because of the lack of an opt-in law. Opt-in restricts commercial e-mailers to send mails only to those people with whom they have an existing relationship and who have permitted them to send mail. Opt-out simply requires spammers to stop sending unsolicited mail after the user asks them to stop. The DTI is currently in consultation over implementing the EC Directive on Privacy and Electronic Communications, which applies opt-in laws to commercial e-mail. The consultation period has just finished and it will become UK law by 31 October. 

The problem is that most spam originates outside of Europe, says David Naylor, a partner at UK technical specialist solicitors Morrison and Foerster, making it difficult to enforce the law. In the US, the Can Spam Bill 2003, originally proposed by Senator Conrad Burns, would impose strict controls on unsolicited commercial e-mail in the US, including prohibiting the use of deceptive subject lines and requiring opt-out instructions. The Computer Owners' Bill of Rights, introduced to the Senate in March, asks the Federal Trade Commission to establish a "do not e-mail" registry of opted-out addresses.

Top 10 spam subject lines   

E-mail management company Surf Control found that spam mails with these subject lines were the most popular in 2002.    

1. XXX Your free adult sites password 

2. Check out our new lower prices. Many "drug" types available. (Viagra) 

3. Get cash out! Refinance while rates are still low 

4. Urgent and confidential (Nigerian hoax) 

5. Remote control car the size of a hot wheel! 

6. Rated #1 best online casino 

7. #1 Pasta pot as seen on TV 

8. Get out of credit card debt 

9. Meet singles in your area 

10. Copy any DVD in one click.

Pros and cons of anti-spam techniques   


Pros: Effective for stopping spam from known open relays 

Cons: Can result in many false positives if used on its own   


Pros: Allows addresses only from known senders 

Cons: Potential for many false positives   

Bayesian analysis 

Pros: Analyses structure of mail without relying on content 

Cons: Works on probabilities rather than certainties   

Content filtering 

Pros: Looks for obvious words or phrases in content. Intelligent use can minimise false positives 

Cons: Easy to circumvent with incorrect spellings and innovative content structuring   


Pros: Creates a unique identifier for a spam mail, almost like a virus signature 

Cons: Each spam can be slightly changed, which can confuse the more basic fingerprinting algorithms   

Consensus filtering 

Pros: Manual element combined with group consensus makes this approach very accurate 

Cons: Communication with back-end server is bandwidth-heavy and may concern security-conscious corporates.

Read more on E-commerce technology