The Semantic Web - Is It Worth It? (A guest blog)

I have watched attempts to produce automated means of tracking and tracing the provenance of on-line data for well over a decade – as a succession of snake-oil salesmen have tried to persuade naive users and politicians that their mash-up tools will turn an “on-line waste tip of unvalidated government data files” into something more than e-slurry.

I had hoped to have a speaker on progress with the Semantic Web at the recent “Uncovering the truth” workshop on data quality organised by the Information Society Alliance (EURIM) and the Audit Commission because I had long thought it provides part of the “answer”. 

However Sean Barker has suggested that it is the little more than latest excuse for not applying traditional data standards: an expensive academic exercise that will led no-where. I therefore asked him to do a “guest blog”. I will not comment further and await your comments.



“In a service-based contract, the user effectively hires some equipment, and every so often, sends it back to the supplier for maintenance. In this context, some years ago I was party to a discussion on data quality for feedback data. When a civil servant in the room realised that we wanted to assess government departments, with an oleaginous smile, he observed that as they set the contract, it was up to them what quality of data they would provide. To which an industrialist said, “Yes, indeed, but you realise we will charge you a risk premium against our costs for your bad data?” The oleaginous smile more than disappeared. Data quality is something that has to be measures in pounds and pence. But what has this to do with the Semantic Web?


The Semantic Web is a marvel that will allow you to ask your phone for “a decent Chinese restaurant near Covent Garden tonight”, and it will come back with a list of restaurants, ranked by proximity, having filtered out those with poor ratings or no tables. All you need to do then is pick your preference, and it will book for you. If you think an app like this is a long way off, then check out Siri (though only in the US as yet). However, somewhere under the hood are hand written translators which integrate the various services that the app uses – semantic web informed perhaps, but not actually Semantic Web.


The myth of the Semantic Web is that it will be the silver bullet that solves all data interoperability problems automagically. The reality is that it will solve a number of very specific problems, but on the Web, what will cripple it is data quality. This is not the simple problem of data errors, but goes to the heart of much that is wrong with the Semantic Web. Computers are always part of a system that involves human goals and aspirations, and yet the semantics of the Semantic Web is only a mathematical exactitude about the relationship between two otherwise undefined symbols. To make those symbols useful, somebody has to use their brain, and make sure that the symbols mean exactly what they are supposed to mean – which is where data quality comes in.


Data quality is about ensuring that data means exactly what it says. Unfortunately that means that everybody who sees the data should understand it in the same way, and on the Web, that means anyone in the world. If you have ever done a data integration problem, you know how hard that is to achieve even in a single company. For example, how long is a man year: 2200 hours (the number of hours I get paid, including holidays)? or 1700 hours (the hours I’m supposed to be clocked in)? or 1500 hours (the hours a project manager can expect, after allowing for training, etc.)? Getting such facts wrong wastes hours sorting out the systems that have used them.


Unfortunately there seem to be far more academic brownie points writing papers involving complex proofs for obscure points in logic than in solving the poorly characterised problems of data quality, or in training people to understand how they – and other people – actually use data. Which is why I (among others wonder whether the £40 millions being used to set up the new British Institution for Web Science is money well spent.


While I would not want to see academics starve, the government would be better off implementing conventional data standards, such as EDXL for communication between emergency services. And the cost savings for government from actually using existing standards could be enormous. The medium is the message, and in this case, the message “create a Semantic Web Institute” is that data interoperability is a terrifyingly difficult technical problem best solved by academics. It is not. It is a painfully methodical approach to finding out what people say and what exactly they mean by what they say, and then checking whether two people say and mean the same thing. Its more about people than machines, and particularly about understanding precisely what they mean when they say something. For those brought up with the oleaginous obfuscations of “Yes, Minister”, this is probably why the Sir Humphrey’s of the civil service would rather we were distracted by a Web Science Institute.”


P.S. From Sean Barker on 29th April – I cite Siri as an example of semantic web type applications. It may be worth adding a comment that they have been bought by Apple.



Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

One could assert that relational databases are worthless because they don't solve the problem of data quality, but that would be silly. Codd came up with an architecture for a database model based on a mathematical formalism, and it turned out to work well for a large class of data.

The semantic web offers an architecture that works well for a lot of other data, especially in distributed environments such as the web. Implementations must address the issue of data quality, as they do in relational databases, but it's still a separate issue that doesn't diminish the usefulness of this model for many classes of data.

Dan Brickley effectively debunked the tired old argument that the semantic web is an academic exercise at . It's everyone's right to complain when their government allocates money somewhere that they feel is inappropriate, but as Dan showed, the idea that the semantic web has been an academic exercise all along is also silly.

The Dan Brickley link seems to be broken.

Academic research proposals (in general - I haven't seen the proposal for the Web Science Institute) suffer from the same problem that has beset software houses for decades: you have to promise more than you realistically expect to deliver or you don't get the contract. Over the years, customers have found a few ways to rebalance software development contracts so that the developers have to deliver much of what they promise, at their own cost if they overrun. This option isn't generally available to those funding research.

Furthermore, science isn't engineering, so it would be wrong to expect Web Science to deliver major transformations in the marketplace quickly. Most computer science delivers over a long period - look at when Ted Codd first published his work, and at the foundational work that preceded it.

Research on data provenance and how best to exploit metadata is important (though it's a pity that it has to be called the Semantic Web and Web Science to get funded). Just don't expect it to solve all the problems of data quality, and certainly not soon.

>Dan Brickley link

When this blog software converted the URL into a link, it included the period after "html", which shouldn't be part of the link. If you paste the URL into the browser and remove the final period it works.


Have just edited the original so hopefully the link works now - Philip

This assessment would benefit from higher quality data. I assume you're referring to the Institute for Web Science, to be funded in the amount of £30 million? Credibility suffers further with the assertion of a "Semantic Web Institute" in place of the actual undertaking.