The meaning of the Web

The World-Wide Web cannot cope with semantics, but version 2.0 addresses the problem

The World-Wide Web cannot cope with semantics, but version 2.0 addresses the problem

After the rather dismal tenor of the previous two weeks' Getting Wired, it seems appropriate to consider more positive developments for the Internet. In particular, at a moment when the online world seems to be marking time, it is a good point to consider the question: where next?

To answer this, it is useful to review the Internet's last decade, and to consider why that period belongs so much to one particular technology - the World-Wide Web - and what is lacking from its current incarnation.

As everyone knows, at the beginning of the 1990s, the Internet had been around for some 20 years, but it was the arrival of an apparently specialised technology from the depths of the European Centre for Nuclear Research - better known by its French acronym Cern - that laid the foundations for what might be termed the first dotcom era.

Originally designed as a way for physicists to publish their papers electronically, the Web soon took on a life of its own well beyond its original constituency. In retrospect, it is clear that this was due to a number of factors.

The use of open standards was crucial as it allowed anyone using any platform to create and view Web pages. The fact that all the technologies were in the public domain helped even more - there were no issues of patents or licensing to worry about. This is why the World-Wide Web Consortium's current proposal to introduce these in the future is so wrong-headed.

The net effect of Tim Berners-Lee's far-sighted approach in those early days was to minimise the obstacles to trying out the Web.

Equally important was the fact that once people tried using the Web and creating Web documents, it immediately became apparent how easy and powerful this new medium was. This was achieved through another masterstroke of Berners-Lee: the stripping down of the underlying HTML to a bare minimum.

Previous attempts to create markup languages for the exchange of documents electronically had all foundered because of a tendency of their designers to impose a degree of rigour on users. This was understandable given the philosophy of the underlying Standard Generalised Markup Language (SGML), but meant that such markup languages took time to learn and were hard to use.

HTML, by contrast, can be learned in a few minutes, and is almost trivially easy to write. This led to an enormous flowering of Web pages, and ultimately to the whole e-commerce industry. But as the Web has matured, and more sophisticated applications of its technology have been devised, particularly for business, Berners-Lee's rough-and-ready approach has begun to shows its limitations.

One of the fundamental problems is that HTML was chiefly designed for human consumption. When reading a Web page, we are able to understand that certain information refers to an address or a telephone number, for example.

But programs that might need to access pages in order to retrieve such data for further processing are unable to tell from the HTML document itself which part refers to that information. This has placed a brake on the development of advanced Web applications that are able to operate in an automated fashion - for example, agent technologies.

A partial solution to this problem is provided by Extensible Markup Language (XML), which allows information about the content of Web pages to be captured through the use of custom tags without adding a complex superstructure of the kind imposed by applications of SGML.

With the arrival of XML, and the associated Extensible Stylesheet Language (XSL), which takes over presentation issues, the first step had been taken towards allowing programs to extract information from Web pages intelligently.

But, of course, XML on its own does not encapsulate the real information in a document - its meaning - it simply allows it to be tagged more precisely. What is needed is a way of conveying something about what the XML tags refers to. To achieve this, Tim Berners-Lee has proposed the Semantic Web - effectively, version 2.0 of the World-Wide Web.

Read more on IT legislation and regulation

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.