The Semantic Web could revolutionise the way we use the Web. But in
the short term, it could boost your knowledge management systems,
says Danny Bradbury.
In spite of what many people say about search engines, they're
remarkably good when you consider the immense challenge before them
- to categorise millions of different pages with almost infinitely
diverse elements of information.
Nevertheless, at the end of the day, it still takes some trawling
through search results to find what you're looking for.
The reason we still have to do a lot of groundwork is that
information on the Web is still almost entirely in human-readable
form. Your browser can read HTML (to describe what a page should
look like) and display the information accordingly, but it has no
idea what the information means. That's the hard part and it's
still up to you to fathom it out.
This is no more than an irritation in the public Web, where
information is often easy to understand. But if you're dealing with
gigabytes of highly specialised information it can slow you down or
even become impossible. Anyone who has tried trawling through
hundreds of business reports or thousands of pieces of technical
information online will know what I mean.
Things are set to change, if Tim Berners-Lee, the inventor of the
World Wide Web, has his way. He is proposing a new concept called
the Semantic Web, which would layer more information on top of the
existing Web to make the data on it understandable by computers.
The idea is that metadata (information about the properties of the
data) would be encoded as part of existing data. The metadata would
contain information about the relationships between data items, for
example, and because computers would understand it computer
searches would be more accurate.
This has a host of commercial implications for companies that have
difficultly managing large amounts of information. A picture
library, for example, could encode pictures so they have different
categories beyond subject.
Dr Janne Saarela, CEO of information management company Profinium,
has been researching the Semantic Web, and has also been involved
with W3C, the organisation that recommends software standards for
the Web.
He explains that the underlying language supporting the Semantic
Web (resource description format or RDF), could be used to describe
an image and the relationship of that image to other pictures (a
previous version versus a later, one, say).
Or you could encode metadata explaining that a technical drawing of
a part plugged into another part - and you could reference the
image of the second part.
Consequently, the Semantic Web could revolutionise knowledge
management. The larger a company gets and the more diverse its
processes are, the more difficult it is to structure that
information into a meaningful form. It is very easy for the most
valuable type of data - descriptions of how information relates to
other information - to fall through the cracks.
The big question is, what types of metadata do you input about the
data within your organisation? And how is it collected?
Current thinking on the matter dictates that different subject
areas (generally defined as different market sectors) will have
their own vocabularies of meaning. For example, the financial
services sector will have different kinds of relationships between
types of data than the shipping or print media sectors will.
Proponents of the Semantic Web within these different areas are
already producing these vocabularies, or ontologies. Many
ontologies stem from basic XML data interchange standards developed
by the same groups.
Luckily, ontologies also go some way towards solving the problem of
inter-company interoperability. Entering meaning specific to your
own company is all well and good, but the next step will be
exchanging information between your company and others.
Going even further out, it would be useful to be able to search
semantically across the whole Web, although we won't be there for a
long time yet. Working groups are tackling the problem of making
different ontologies interoperable, but it's a tough nut to
crack.
Nevertheless, the use of Semantic Web capabilities within a
company's intranet will still bring huge benefits, as long as you
can gather enough of the required meaning.
According to John Darlington, CTO of knowledge management software
vendor Active Navigation, much of this metadata gathering can be
automated. "You could farm email trails for example, and if you
have other workflow you could farm that too - when people update
data," he says.
"Pulling it into the semantic structure is the black box where the
hard bit happens. It's a matter of accessing that. It's simply
about building filters. How we built our filters is as an
extensible set of agents."
Agent technology is closely linked to the Semantic Web. As
autonomous programs that process data independently, agents made
their debut in the mid-Nineties, but their success was limited
because of the lack of semantic meaning on the Web. Shopping agents
would do searches on key words within websites, for example, which
returned inaccurate results.
According to Steve Ross-Talbot, CEO and founder of rules processing
software company SpritSoft, agents could dramatically improve as
the Semantic Web takes off.
He argues that as well as farming semantic meaning from corporate
content, agents can be used to navigate the Semantic Web and even
to perform transactions based on what data they find.
If and when this makes it to the original Web, it could
revolutionise e-commerce. Until then, take some advice from us: get
to know your data!
Top 10 business uses
1. Finding technical support information for customers or
employees
2. Enabling business librarians to automatically classify
material rather than trying to do it manually
3. Creating automated, ad hoc business reports
4. Relating images to meaning in a picture library, enabling
more accurate searches
5. Structuring engineering information to make it easier to
find specific technical details
6. Building ad hoc e-learning systems by pulling different
paths of meaning out of a central core of educational
material
7. Automated online shopping using intelligent agents
8. Automating content management by being able to tell
content management systems what data actually means
9. Presenting a more coherent business portal to employees
or customers, by returning relevant content to them on the Web,
based on their personal criteria - say news articles or stock tips
that would reflect real interests, rather than being based on
keyword searches
10. Making peer-to-peer file systems easier to use by
creating a layer of meaning on top of them, so that users can find
information more readily
The semantic Web: A glossary
XML: Extensible Markup Language is the language that the rules
and meanings which underpin the Semantic Web are written
in.
Ontology: A common set of XML-based definitions addressing a
particular subject area or market sector. Ontologies provide the
vocabulary of meanings and relationship for a particular subject
(eg financial services, or shipping).
Resource description framework (RDF): XML-based language used
to describe resources and their properties and values.
Agent: An autonomous program that navigates the Web to find
information.
HTML: The HyperText Markup Language. Used to
describe most data on the Web today, but has little ability to
describe what it really means.
Metadata: Data about data. Metadata is used to describe
properties of data online.