Data is not really perfectible and ultimately, perfection is the enemy of progress, writes James Richardsonbusiness analytics strategist at Qlik and former Gartner analyst, .
Ask people what slows or stops the use of business intelligence (BI) in their organisations and poor data quality is often one of first things they say.
Now, I’m not going to argue with that view – I’ve been around BI for too long to do so and know that lack of good quality data a very real issue, particularly as organisations begin to put focus on BI and using data to drive decisions. In fact I’ve written in the past on how to approach the procedural issues that give rise to much poor quality data. (If you have a Gartner subscription you can read my 2008 research note ‘Establish a Virtuous Cycle of Business Intelligence and Data Quality‘.)
There’s no doubt that addressing data sourcing processes helps ameliorate basic errors, build trust and overcome the initial resistance that’s common in any BI programme’s initial stages.
What I do take issue with is when I hear that “data quality needs to be perfect in order for us to roll out our dashboards”. In this case the aphorism that “perfection is the enemy of progress” is really true. So, why do people set themselves this impossible goal? Well, first you’ve got to consider who’s saying this. In the main, it’s IT or technical staff. They often feel exposed because although they know the data’s safe, secure and backed up (hopefully!), they’ve often no idea if its content is good or bad (and nor should they – it’s not their job). The irony is that it is not until the data is exposed to active usage by managers, decision makers or analysts that the quality and therefore usefulness of data sets becomes truly apparent.
The effort spent improving data has to be balanced against the value of doing so. For most uses, it’s not perfect data that’s needed, but data within acceptable tolerances. Of course, the level of tolerance varies by function and use case. For financial data the tolerance for error is obviously very low. That’s not such as issue, as the perfectibility of financial transaction data is within reach, but only because of the huge effort that goes into its stewardship. The whole practice of accountancy and auditing is fundamentally about data quality, as its aim is to remove as much error from the financial record as possible. The fact that data is generated and controlled inside our organizations also helps. In other, less regulated, functions the tolerance can afford to be somewhat less rigorous. Why is this? Because people need data to answer business questions right now! Data that’s 80%+ accurate may be enough for many operational or tactical decisions. They may only need a hint at the direction that data is taking overall for it to be valuable, and they may only need it for this instance. Immediacy often trumps purity.
Getting perfect (or even near perfect) data requires herculean efforts. Zeno’s paradox applies. It becomes harder and harder to reach perfection as the volumes and diversity of data grows. It simply isn’t possible (or cost effective) to make all sources perfect to the same degree. There’s another big question – what does the perfectionist do about data which is not perfect? Conform it? Therefore changing it, with the risk of over-cleaning the data, or “correcting” errors incorrectly? We have to accept the fact that data is rarely perfectible, and getting less so, and mature our approach to data quality to ensure that it’s fit for purpose in today’s information environment where burgeoning and varied data flows into our organizations.
I’d go further and say that the myth of perfect data is more dangerous than decision makers understanding that the data you have is dirty, has anomalies and glitches. Finally, reaching perfection, were it even possible, might not be a good thing. Why? Because perfection also implies a rigid standard, and a fixed frame of reference itself can limit innovative thinking, by stopping people answering really fundamental questions. Blind faith in the certainty of perfect data can never stand up to the shock of the new, to those things we can’t or won’t see coming.