The World-Wide Web cannot cope with semantics, but version 2.0
addresses the problem
After the rather dismal tenor of the previous two weeks' Getting
Wired, it seems appropriate to consider more positive developments
for the Internet. In particular, at a moment when the online world
seems to be marking time, it is a good point to consider the
question: where next?
To answer this, it is useful to review the Internet's last decade,
and to consider why that period belongs so much to one particular
technology - the World-Wide Web - and what is lacking from its
current incarnation.
As everyone knows, at the beginning of the 1990s, the Internet had
been around for some 20 years, but it was the arrival of an
apparently specialised technology from the depths of the European
Centre for Nuclear Research - better known by its French acronym
Cern - that laid the foundations for what might be termed the first
dotcom era.
Originally designed as a way for physicists to publish their papers
electronically, the Web soon took on a life of its own well beyond
its original constituency. In retrospect, it is clear that this was
due to a number of factors.
The use of open standards was crucial as it allowed anyone using
any platform to create and view Web pages. The fact that all the
technologies were in the public domain helped even more - there
were no issues of patents or licensing to worry about. This is why
the World-Wide Web Consortium's current proposal to introduce these
in the future is so wrong-headed.
The net effect of Tim Berners-Lee's far-sighted approach in those
early days was to minimise the obstacles to trying out the
Web.
Equally important was the fact that once people tried using the Web
and creating Web documents, it immediately became apparent how easy
and powerful this new medium was. This was achieved through another
masterstroke of Berners-Lee: the stripping down of the underlying
HTML to a bare minimum.
Previous attempts to create markup languages for the exchange of
documents electronically had all foundered because of a tendency of
their designers to impose a degree of rigour on users. This was
understandable given the philosophy of the underlying Standard
Generalised Markup Language (SGML), but meant that such markup
languages took time to learn and were hard to use.
HTML, by contrast, can be learned in a few minutes, and is almost
trivially easy to write. This led to an enormous flowering of Web
pages, and ultimately to the whole e-commerce industry. But as the
Web has matured, and more sophisticated applications of its
technology have been devised, particularly for business,
Berners-Lee's rough-and-ready approach has begun to shows its
limitations.
One of the fundamental problems is that HTML was chiefly designed
for human consumption. When reading a Web page, we are able to
understand that certain information refers to an address or a
telephone number, for example.
But programs that might need to access pages in order to retrieve
such data for further processing are unable to tell from the HTML
document itself which part refers to that information. This has
placed a brake on the development of advanced Web applications that
are able to operate in an automated fashion - for example, agent
technologies.
A partial solution to this problem is provided by Extensible Markup
Language (XML), which allows information about the content of Web
pages to be captured through the use of custom tags without adding
a complex superstructure of the kind imposed by applications of
SGML.
With the arrival of XML, and the associated Extensible Stylesheet
Language (XSL), which takes over presentation issues, the first
step had been taken towards allowing programs to extract
information from Web pages intelligently.
But, of course, XML on its own does not encapsulate the real
information in a document - its meaning - it simply allows it to be
tagged more precisely. What is needed is a way of conveying
something about what the XML tags refers to. To achieve this, Tim
Berners-Lee has proposed the Semantic Web - effectively, version
2.0 of the World-Wide Web.