There is little point spending time and money creating a website that cannot be found. Search engine optimisation can make a site appeal to the spiders crawling the web
Web administrators have previously been able to secure high rankings for sites on search pages by outwitting the spiders that crawl and index web pages.
Those who understand a spider's algorithm can exploit it by achieving high rankings in search engine listings. Mastery of the art means that relatively small organisations can, and regularly do, get noticed more on the web than their larger counterparts.
A spider is a program that searches for information on the web. It is used to locate HTML pages by content or by following hypertext links from page to page. Search engines use spiders to find new web pages that they summarise and add to their indexes.
Rather than relying on a quick fix by stuffing pages full of appealing keywords, search engine optimisation regularly involves IT in rethinking ways that sites can be made more appealing to a more rigorous generation of spiders, said Warren Cowan, managing director of web analytics agency Greenlight.
"The days of papering over the cracks are over. This means IT has to deal with deeper architectural questions such as, 'Do we need to change our content management system?'; 'Do we need to reprogram our web server and rewrite all our URLs?' or 'How do we optimise our dynamic templates?'"
Many of the techniques used in the early days of optimisation, such as stuffing meta tags full of keywords or using "invisible text" - keywords the same colour as the background sprinkled liberally across a web page - have long since been discovered and barred by the crawlers, said experts.
"There are two main ways of increasing your ranking with spiders or crawlers," said James Dale, director of Optimiser. "The first is to ensure you have the maximum in-bound and out-bound relevant links to your site. The second is on-page factors. You have to ensure that everything that can be crawled on the site appeals to the algorithm."
Keeping the spiders happy
Spiders like HTML and are not interested in any programming that gets in their way of indexing text. Understanding how spiders cope with design and navigational aspects of a site is crucial, and all too often overlooked by web designers, said Cowan.
"Spiders enter a site from the top and work their way down. If they do not find enough relevant stuff, they quickly shoot off. Unfortunately, an impressive looking site that may have cost thousands of pounds to design is not going to rank very highly. The simpler the site, the better."
Spiders ignore sites that use databases to serve up dynamic pages. Shopping cart URLs, for example, tend to contain characters or parameters required by the back-end database engine. This flags up to a spider that there are infinite permutations rather than a unique file, and so it moves on rather than risk being overrun.
Spiders also avoid indexing pages that use characters in the URL for the purpose of recording new visitors, such as those used in shopping cart systems. It is possible to keep dynamic functions on a website, but the URLs will have to be flattened into an HTML look-alike file path.
The extent of a URL rewrite depends on the platform running the site, said Cowan. He warned that many content management systems do nothing to resolve their lack of visibility. "No one stopped to think whether search engines could access content management system," he said.
It is worth checking whether a content management system offers some way of increasing the site's visibility. "Many do have a rewrite facility and so it will often be a case of fine tuning, rather than a replacement," Cowan said.
One of the best ways to make the site easy for a search engine to navigate is to keep the design simple. Adding a forum will boost its visibility and it is worthwhile e-mailing other firms in the same field to exchange relevant links. But remember, meta tags are old hat and most search engines will reject invisible text.
Optimising a site for search engines
Allow search bots to crawl your site without session IDs or arguments that track their path. These techniques are useful for tracking individual user behaviour, but the access pattern of bots is entirely different.
Using these techniques may also result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page
Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since it last crawled the site. Supporting this feature saves you bandwidth and overhead
Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it is current for your site so that you do not accidentally block any crawlers
If your company buys a content management system, make sure the system can export the content so that search engine spiders can crawl your site.
This was first published in February 2004