How universities' intelligent web project unlocks the information that really counts

Imagine clicking on a low point on an oil production graph to launch a web search that threw up only strictly relevant...

Imagine clicking on a low point on an oil production graph to launch a web search that threw up only strictly relevant information, including news stories about the Iraq war and reports on everything from international economy to effects on wildlife.

This is a far cry from searching for "oil" and getting hits ranging from car engines to massage services, and it is a reality among researchers developing what is known as the semantic web.

The prospects were described by a leading researcher in this area, Nigel Shadbolt, a professor in the School of Electronics and Computer Science at Southampton University, when he presented the BCS and the Royal Signals Institution annual lecture.

Intelligent web searches would not just look for key words but would also understand what a page is about and its relevance to the user, he said.

In one project doctors can click on a picture of a diseased area of a body to get analyses of the patient's MRI scans to track the progress of the disease, through a link to a separate service.

"Humans can inspect web pages and draw inferences," Shadbolt said. "On a conference home page humans recognise that the title refers to an event, that the pictures are of the speakers, and that other information refers to the location and the topic.

"The ambition of the semantic web is to enable machines to draw the same inferences. So we need meta data - information about the displayed information to sit behind the content, describing what it is about and how it relates to other objects."

Shadbolt said, "We have been used to doing this in defining web pages - all that HTML in angle brackets. Now we are seeking to develop more angle brackets to define what the machine will understand in the content."

Shadbolt is leading an £8.8m research project involving five UK universities over six years, and work is advancing on several fronts covering the practicalities.

A web service developed at Southampton University can classify papers on computer science. It has been trained using 300,000 papers from an established digital library, with the authors' and editors' classifications and the library's own classification scheme, plus techniques such as machine learning and statistical analysis.

A service being developed at Sheffield University will analyse any web page and apply the meta data to objects of interest.

Storage and retrieval of the masses of meta data are being worked on, as is the idea of an information lifecycle, starting with creation and moving through use, publication, maintenance and decommissioning, said Shadbolt.

The project is also considering human issues. So far the semantic web has been used in scientific and military applications by communities of specialists in certain fields. This has revealed that there has to be agreement on terminology to get effective searches and presentation of results.

"This shows that information can be integrated using intelligent web services - and basically they all rely on lots of angle brackets of meta data that describe the information, using common shared terminology," Shadbolt said.

"The huge growth of the web and the massive advances in technology mean that computing brute force can deliver so much content so quickly that there is not enough human processing power to go through it in detail. We are starting to see a requirement for our machines to know enough about structural descriptions to make the first cut of what we might be interested in, instead of giving all of it."

Read more on Data centre hardware