Unweaving the tangled web of dumb data

Semantic web technology could help firms make better use of their data and gain better search results, but don't expect to see it...

Semantic web technology could help firms make better use of their data and gain better search results, but don't expect to see it on the public internet any time soon

One of the biggest problems with the web and with knowledge management tools is that information is dumb. The data contained in websites and knowledge management systems does not know what it is. This makes web searches very difficult, turning up hundreds of thousands of results that are completely irrelevant or only partially related to your subject matter.

The semantic web project, initiated by Tim Berners-Lee, the creator of the world wide web, has been designed to make web and knowledge management data more intelligent. It works by encoding metadata into information that helps to describe not only that information, but its relationships with other pieces of data. In this way, you augment traditional, hyperlinked connections with a new type of semantic link. You create an invisible matrix in which information is connected by meaning.

Semantic web links can be powerful when used in a commercial context. If you operate in a vertical sector such as food production, you may have thousands of pages on your intranet detailing different aspects of your processes and products. Searching through them could be difficult, but if you have semantically encoded them, you may find it easier. Suddenly, you will be able to start with a particular ingredient and ask the browser to find all foods that use more than 10 milligrams of that product, for example. Or you may start with a finished food item and semantically browse the ingredients that constitute more than 5% of its overall make-up.

Although some of this work can be done in traditional relational database management systems, such structures are rigid and not easy to change, said Alfredo Morales, director of collaborative healthcare at Boston-based medical software company Clinician Support Technology. His software product, Baby CareLink, is a knowledge base designed to advise and remind clinicians dealing with premature births. It works by encoding information about each child in a semantic format.

"Semantic technology lets us establish loosely coupled relationships within the patient's information. Relational database rules would have to be hard coded and they also require hard work to maintain," he said. "Semantic technology lets the knowledge base adapt as we learn more about what is important for each particular baby.

Semantic information is encoded using an XML-based standard called the Resource Description Format. RDF can encode relationships between particular pieces of information.

For example, "John" could be described as "man" and linked to "Mary" with the relationship "husband of". This sounds simple, but the possible descriptions of different objects and their relationships are limitless. Companies are getting around this by developing vocabularies for particular subject areas. Called ontologies, these vocabularies often focus on vertical markets which have specific subjects and relationships. Another XML-based language, called Owl, is used to create these ontologies.

Semantic encoding can be particularly useful in inference engines. Encouraging relationships between pieces of information enables you to analyse that information for new relationships. In our example, "Mary" may have the relationship "daughter of" with "Eric". Now, although it has not been explicitly encoded, we could infer that "Eric" has the relationship "father-in-law of" with "John". When dealing with rich sets of complex data, such capabilities can be very useful.

Using such technologies within the corporate firewall is one thing, but building a whole new web based on them is quite another. If we could create a second generation web using semantic technology, the benefits would be huge.

Companies such as Google, which has already made the best it can of the web's unstructured content base, could make web searches much more intelligent, returning results that would not only be more relevant, but which could then be navigated by concept, rather than by hyperlink. Imagine clicking on a piece of data and receiving a list of web-based elements that it is related to, along with a description of those relationships.

John Davies, manager off next generation web research at BT's research division BT Exact, said we are a long way from creating a semantic web. "Whether it will make the step to the external web, the jury is out. It is unlikely that anyone will turn those five billion pages into RDF any time soon," he said.

Another problem is that ontologies focus on specific areas, but the web covers all areas of information. Consequently, we must bring ontologies together. The Dublin Core Metadata Initiative has been working since 1995 to develop an infrastructure to do just that.

The semantic web is not likely to hit your browser any time soon, but the semantic intranet just might. The underlying technology has been on the agenda since the mid-to-late 1990s, but it is now starting to move from theory into commercial products as companies begin to release RDF-capable knowledge management systems and inference engines. UK-based Inference Networks is one such firm, and in the US, Amblit Technologies has a semantic browser, and Intellidimension has an RDF data management system.

The key challenge lies not just in encoding your existing data with RDF, but also in developing or finding an ontology that best suits your business. Do so, and the rewards could be high as you begin to discover all sorts of tacit information buried inside your company's knowledge base.

DMCI  http://dublincore.org/

W3C semantic web activity  www.w3.org/2001/sw/

Semantic Web Special Interest Group  http://business.semanticweb.org/

Semantic web community portal  www.semanticweb.org/

Read more on Business applications