The data layer

The Semantic Web could revolutionise the way we use the Web. But in the short term, it could boost your knowledge management...

The Semantic Web could revolutionise the way we use the Web. But in the short term, it could boost your knowledge management systems, says Danny Bradbury.

In spite of what many people say about search engines, they're remarkably good when you consider the immense challenge before them - to categorise millions of different pages with almost infinitely diverse elements of information.

Nevertheless, at the end of the day, it still takes some trawling through search results to find what you're looking for.

The reason we still have to do a lot of groundwork is that information on the Web is still almost entirely in human-readable form. Your browser can read HTML (to describe what a page should look like) and display the information accordingly, but it has no idea what the information means. That's the hard part and it's still up to you to fathom it out.

This is no more than an irritation in the public Web, where information is often easy to understand. But if you're dealing with gigabytes of highly specialised information it can slow you down or even become impossible. Anyone who has tried trawling through hundreds of business reports or thousands of pieces of technical information online will know what I mean.

Things are set to change, if Tim Berners-Lee, the inventor of the World Wide Web, has his way. He is proposing a new concept called the Semantic Web, which would layer more information on top of the existing Web to make the data on it understandable by computers.

The idea is that metadata (information about the properties of the data) would be encoded as part of existing data. The metadata would contain information about the relationships between data items, for example, and because computers would understand it computer searches would be more accurate.

This has a host of commercial implications for companies that have difficultly managing large amounts of information. A picture library, for example, could encode pictures so they have different categories beyond subject.

Dr Janne Saarela, CEO of information management company Profinium, has been researching the Semantic Web, and has also been involved with W3C, the organisation that recommends software standards for the Web.

He explains that the underlying language supporting the Semantic Web (resource description format or RDF), could be used to describe an image and the relationship of that image to other pictures (a previous version versus a later, one, say).

Or you could encode metadata explaining that a technical drawing of a part plugged into another part - and you could reference the image of the second part.

Consequently, the Semantic Web could revolutionise knowledge management. The larger a company gets and the more diverse its processes are, the more difficult it is to structure that information into a meaningful form. It is very easy for the most valuable type of data - descriptions of how information relates to other information - to fall through the cracks.

The big question is, what types of metadata do you input about the data within your organisation? And how is it collected?

Current thinking on the matter dictates that different subject areas (generally defined as different market sectors) will have their own vocabularies of meaning. For example, the financial services sector will have different kinds of relationships between types of data than the shipping or print media sectors will.

Proponents of the Semantic Web within these different areas are already producing these vocabularies, or ontologies. Many ontologies stem from basic XML data interchange standards developed by the same groups.

Luckily, ontologies also go some way towards solving the problem of inter-company interoperability. Entering meaning specific to your own company is all well and good, but the next step will be exchanging information between your company and others.

Going even further out, it would be useful to be able to search semantically across the whole Web, although we won't be there for a long time yet. Working groups are tackling the problem of making different ontologies interoperable, but it's a tough nut to crack.

Nevertheless, the use of Semantic Web capabilities within a company's intranet will still bring huge benefits, as long as you can gather enough of the required meaning.

According to John Darlington, CTO of knowledge management software vendor Active Navigation, much of this metadata gathering can be automated. "You could farm email trails for example, and if you have other workflow you could farm that too - when people update data," he says.

"Pulling it into the semantic structure is the black box where the hard bit happens. It's a matter of accessing that. It's simply about building filters. How we built our filters is as an extensible set of agents."

Agent technology is closely linked to the Semantic Web. As autonomous programs that process data independently, agents made their debut in the mid-Nineties, but their success was limited because of the lack of semantic meaning on the Web. Shopping agents would do searches on key words within websites, for example, which returned inaccurate results.

According to Steve Ross-Talbot, CEO and founder of rules processing software company SpritSoft, agents could dramatically improve as the Semantic Web takes off.

He argues that as well as farming semantic meaning from corporate content, agents can be used to navigate the Semantic Web and even to perform transactions based on what data they find.

If and when this makes it to the original Web, it could revolutionise e-commerce. Until then, take some advice from us: get to know your data!

Top 10 business uses
Finding technical support information for customers or employees
2. Enabling business librarians to automatically classify material rather than trying to do it manually
3. Creating automated, ad hoc business reports
4. Relating images to meaning in a picture library, enabling more accurate searches
5. Structuring engineering information to make it easier to find specific technical details
6. Building ad hoc e-learning systems by pulling different paths of meaning out of a central core of educational material
7. Automated online shopping using intelligent agents
8. Automating content management by being able to tell content management systems what data actually means
9. Presenting a more coherent business portal to employees or customers, by returning relevant content to them on the Web, based on their personal criteria - say news articles or stock tips that would reflect real interests, rather than being based on keyword searches
10. Making peer-to-peer file systems easier to use by creating a layer of meaning on top of them, so that users can find information more readily

The semantic Web: A glossary
Extensible Markup Language is the language that the rules and meanings which underpin the Semantic Web are written in.
A common set of XML-based definitions addressing a particular subject area or market sector. Ontologies provide the vocabulary of meanings and relationship for a particular subject (eg financial services, or shipping).
Resource description framework (RDF):
XML-based language used to describe resources and their properties and values.
An autonomous program that navigates the Web to find information. HTML: The HyperText Markup Language. Used to describe most data on the Web today, but has little ability to describe what it really means.
Data about data. Metadata is used to describe properties of data online.

Read more on Web software