Thought for the day:Vote for Web democracy

Research wizard Peet Morris looks at how you can influence the quality of the information you find on the Net.From very early on,...

Research wizard Peet Morris looks at how you can influence the quality of the information you find on the Net.From very early on, the Internet contained a fantastic amount of opinion, ideas and twaddle, all of it described as "information", yet none of it was easy to find. It was worse than a library in which the books were randomly distributed upon its shelves - you know the information you want is there somewhere, but you'll be damned if you can find it.

So, to put this mess into some semblance of order, search-engines, such as Lycos and AltaVista, were created. These engines routinely crawl the Web, building different versions of the much-needed index.

Before I proceed - and this is an important point - the raison d'être for some engines is simply to index everything while, for others, it's something else.

But how is that index built and how is it presented to the consumer? Commonly, to "rate" sites, the indexing mechanisms note how often a keyword appears on the site's pages - or how often it appears on a single page within the site.

So, say you're looking for information on a (mythical) anti-depressant called Bluesbeater. Typically, a site that contains many textual occurrences of the word Bluesbeater would appear at, or near, the top of the index and, therefore, your search. To put that another way, every time a keyword appears on a page, a crawler awards a vote for the page, so the more times a keyword appears, the higher the vote count.

So, to get a site near or at the top of an index, Web developers sometimes use hidden text to "riddle" their pages with suitable keywords using so-called META tags.

To see these, go to a site and, if you're using Internet Explorer, select "View | Source" from the menu. If you see META NAME="keywords" in the page's code, that's some of the hidden text that's processed by these crawlers.

So, in using an index that works like this, when you search for Bluesbeater you'll probably find the site with the greatest vested interest in its promotion - in other words, the site of the manufacturer.

However, "legitimate information sites" (support groups or sites noting adverse side-effects perhaps) probably don't employ such subterfuge - or to the same degree. For one thing, unless their sites are developed by Web-savvy developers, they simply wouldn't know about META tags and the like. Unfortunately this sometimes means that the best information isn't so easily found in your search results.

And then Google appeared. Google's trick was to ignore how many times a site might contain a word (or how/where it was placed it on its pages). Instead, Google rates a site using an extremely simple, yet ingenious strategy.

The principle is this. Fact: good sites will be found. Fact: if these sites are really good, then other sites will link to them. Therefore, Google rates a site on how many other sites link to it. Here the voting mechanism isn't word-based , instead it's site-based, where one site votes for another.

However, there's an extra-clever twist. If the voting site is itself highly voted for, then its votes (its links to other sites) carry more weight. Genius!

So, to conclude: it's no longer a Web-head, or how often you submit your site to a search engine that can help get your site at the top of a search - it's other sites. Create a useful site and others will find and then link to it.

Over time, as more people find and link to it, it's rated higher and higher. In my mind, that's exactly how it should be.

How Google works:

A new idea - pictorial Web searches (in this case, using multiple search engines):

What's your view?
How could search engines be improved? Tell us in an e-mail >> reserves the right to edit and publish answers on the Web site. Please state if your answer is not for publication.

Peet Morris
has been a software developer since the 1970s. He is a D.Phil (PhD) student at Oxford University, where he's researching Software Engineering, Computational Linguistics and Computer Science.

Read more on Data centre hardware