In my experience of working on large data management projects, there has been one common theme: quality of the underlying data in corporate systems is always worse than expected at the outset of the project, and data quality strategy needs to be rethought.
For example, one 2009 Information Difference survey found that on average, companies estimated up-front that data quality strategy and activities would take up around 10% of the effort in their master data management (MDM) projects; in reality, those same companies reported that it actually took up 30% of their time.
In one project I worked on, there was a master customer master database of all the companies that the client thought it dealt with -- 25,000 in all. After a thorough cleanup exercise, it was found that many records were duplicated and others were defunct; the final tally was just 5,000 customers, one-fifth of the original perception. This is not that rare: It is normal in dealing with product or materials master files to find duplication rates of 20% to 30%.
Enough spare parts for 90 years
Data duplication on that level has a real cost. One pharmaceutical company I spoke to had five main UK manufacturing centres, each with its own warehouse of spare parts for the machines in the factories. In theory, all five sites shared a common system, so spare parts -- 65,000 inventory items in all -- could be ordered from another location. But in reality, the system was hard to use, so each of the separate sites built up its own inventory of spare parts sufficient for its needs. More than sufficient, in fact: After a data cleanup, it was discovered that the company had enough spare parts to last 90 years in some cases. This project delivered benefits that dramatically outweighed its costs, with a net present value of £3.5 million and a payback period of a few months.
And that was not an isolated example. In another case of ill-begotten data quality strategy, a global manufacturer of electronics goods outsourced the building of boxes for its products to be delivered to retail stores. For one new product, the company went through the usual specification process, but the dimensions of the boxes quoted to the suppliers were in centimetres rather than millimetres. In a scenario reminiscent of a well-known scene in the movie Spinal Tap (though in reverse), the mistake only came to light when trucks started to arrive at warehouses with boxes 10 times too large in every dimension. This little error cost €15 million to sort out.
I could go on, but the point of these anecdotes is that they are surprisingly representative of the state of data in many corporate systems. In another Information Difference survey, this one from 2010, just 42% of respondents said that they had data quality technology deployed in their main corporate systems, and barely one-third of those organisations reported that the tools were in widespread, systematic use.
There are a number of reasons for this state of affairs. A critical one is that it is difficult to fully align the interests of those entering data into systems with those of the larger corporation. A telesales representative taking an order cares a lot about the customer details and payment details, since his commission probably depends on that. But is he quite so fussed about the credit reliability of the customer or the demographic background information that marketing has asked telesales workers to capture? As this additional data is probably not something reps are directly assessed on and is instead intended for the greater benefit of the organisation, the answer is, Probably not.
This is not to say that employees wilfully enter bad data into systems, but inevitably their level of attention and diligence is related to their direct interests. One of the few systems in a company with carefully checked data on a universal basis is the personal expenses system, where employees have a vested interest in making sure that their expenses are fully recorded and paid; the payroll system is another. The only way around the issue of faulty data entry is to enforce business rules at the source system, using data quality software that checks for duplicates and likely errors -- though as we have seen, few companies deploy such software widely.
I believe this situation exists partly because data quality is a subject that is hard to get excited about; no ambitious young graduate ever dreams of eventually being promoted to data quality manager, at least not that I’ve found. In the 2010 Information Difference survey, 68% of the responding companies admitted to not measuring the cost of poor quality data. Without such hard data, it is difficult to get the attention of senior business executives, and so a vicious circle develops: Some people may be aware of a data quality problem, but no one feels motivated to deal with it. Senior management doesn’t know about the problem, and no one further down the corporate hierarchy wants to be the bearer of bad tidings to their bosses.
This all matters because the poor state of data quality strategy is a key barrier to enterprises being able to run their business effectively. Business decisions depend on good quality information, yet many organisations are effectively running blind because they either do not have all the timely and accurate information they need or, in some ways even worse, think they do have it, but in fact the data is flawed.
Companies need to take this issue seriously and start measuring the state of their data quality and what it is costing them, then put in place processes and technology that can radically improve things. But since doing so involves people changing their behaviour rather than an advance in technology, I am not holding my breath.
Andy Hayler is co-founder and CEO of analyst firm The Information Difference and a regular keynote speaker at international conferences on MDM, data governance and data quality. He is also a respected restaurant critic and author (www.andyhayler.com).