We are all familiar with examples of the poor data quality that pervades large organisations. How many misspelt versions of your name appear on letters and bills sent to you, for example?
There are several underlying reasons for such issues. Firstly, there are basic issues around the quality of data captured by companies. Human beings respond to incentives and the ones doing data entry are not the highest paid people in an organisation. If they are in sales, they care about getting your credit card details right – because otherwise they won’t get paid commission – but other information about you may be less carefully attended to.
Once data is captured, though, a new set of issues starts to creep in. Data gets out of date quite quickly: in the US, 15% of people move address annually according to the US Census Bureau (in the UK it is about 11%). How confident are you that all the companies and government departments that you interact with are racing to update that personal data of yours?
However, at this stage we have been talking pure data quality – is that address record right or wrong? There is a more insidious problem in large companies and that is data mastering. According to a 2008 survey by my firm, Information Difference, the average large company has six different systems holding supposedly “master” data about customer, nine in the case of product data, and 13% of survey respondents had over 100 such sources. No one intended this mess to unfold, but most large companies have dozens or even (more realistically) hundreds of separate applications, from ERP to sales force automation, from supply chain to marketing and many more -- never mind the abundance of spreadsheets that drive so many companies.
For more on data quality and master data management
When a new application is implemented then it is populated either from scratch, from a spreadsheet or via a feed from something existing. Ideally that will be a properly maintained interface, but it may be a one-off dump of data, with the sources starting to drift apart over time as they are separately maintained.
Even if this problem is avoided, companies buy other companies, and when a company is acquired then its computer systems are not magically integrated overnight: it may take years for a semblance of integration to take place. With a global company making many acquisitions a year, it is not hard to see that even the purest and well-organised technology architecture will quickly become prone to inconsistent data.
Enter master data management
This is where master data management (MDM) comes in. It is hardly a new subject, but over the last decade or so many technologies have been developed that provide dedicated hubs for managing master data (as distinct from transactional data). The idea is that these hubs can provide a single, authoritative source of master data that feed into other systems that need them. However, MDM, while hardly in its infancy, is barely a teenager in terms of maturity, and relatively few companies have fully and successfully implemented MDM across the complete scope of their enterprise and across all data domains.
What is apparent is that master data initiatives and data quality are intimately linked. In a 2010 Information Difference survey, respondents claimed they budgeted 10% of their MDM project budget to data quality activities, yet on average actually spent 30%, three times the sum they had intended. What is a little odd is the time that it has taken many MDM suppliers to become fully aware of this.
Data quality market
At their beginning, few suppliers had integrated data quality offerings, mostly offering optional “partner” arrangements with data quality suppliers such as Trillium and Address Doctor (since acquired by Informatica). One limitation has been that the data quality market has grown up mainly around dealing with customer names and addresses, so there are scores of suppliers that are good at dealing with local postal addresses, but very few who can provide useful input into other data domains, such as product or assert data.
Such domains are much more complex than customer names and addresses and less structured, so making it tougher to apply simple rules in the way well-known algorithms such as Soundex and Levenshtein can be applied to customer data. Relatively few data quality suppliers have strayed beyond customer data, though there are a few, such as Datactics, Inquera and Silver Creek (since acquired by Oracle) that specialise in product data.
While it is possible to have a data quality initiative without considering MDM, the reverse is not true
I believe this integration has further to run. While it is possible to have a data quality initiative without considering MDM, the reverse is not true: every MDM project must have a data quality component. If you don’t think so, you will be one of those companies that will quickly find the data quality effort will consume an unexpectedly large part of your MDM budget. This is because the state of data quality is always worse than people reckon – I have never seen an MDM project where the data quality was better than expected.
For MDM software suppliers, greater consideration should be given to how data quality can be seamlessly embedded in their software, especially as regards to how to deal with data quality beyond customer name and address. There are plenty of data quality suppliers out there, so lots of partnerships and acquisition opportunities. Just having a list of loosely linked “partners” is unlikely to be good enough these days.
From an enterprise viewpoint this means that data quality should be a central part of your MDM initiative, and you need to consider it when evaluating software and planning your project. You need to probe suppliers on what data quality functionality they provide, how well integrated it is, and how well it will work on your particular data.
A demo of matching algorithms on customer data by a vendor means nothing if the main thrust of your MDM initiative is around product, asset or financial data. Above all, make sure that you allow sufficient resources for the data quality component of your project.
About the author
Andy Hayler is co-founder and CEO of The Information Difference and a keynote speaker at conferences on master data management, data governance and data quality. He is also a restaurant critic and author (www.andyhayler.com).