Entropy and data quality: How to keep your data accurate

The quality of data held in IT systems will deteriorate unless steps are taken to maintain its accuracy and consistency. We look at what businesses should be doing to control that data quality more effectively

IT systems are subject to many of the same laws that govern the rest of the universe, however much IT suppliers and users might like to think otherwise.  

One of the most fundamental laws of physics, entropy, essentially states that unless energy from the outside is applied, the amount of disorder in an enclosed system will increase over time. 

Restated in IT terms, this means the quality of data held in IT systems will deteriorate unless steps are taken to maintain its accuracy and consistency. Given that a majority of organisations rank the data they hold as a primary asset, what should you be doing to look after that data quality more effectively?

Data drift

The value of data to organisations is clear, if often hard to quantify precisely. 

Few companies, however, have the processes and tools in place to ensure they maintain quality in their data, and certainly not as a matter of routine. Indeed, it is fair to say that most organisations tend to only act to ensure data quality when they implement new systems, undergo a major update, when they need to integrate with another platform, or when something goes disastrously wrong. 

As a consequence, the accuracy of the information held in IT systems, even that in business-critical databases, will "drift" over time.

Recently, Freeform Dynamics looked at data quality from the perspective of IT professionals and of line of business managers.

Human error

Reasons for why data quality degrades over time include typing mistakes when editing records, software bugs or communications glitches introducing errors, unverified external data sources being used to update or supplement information, as well as data records not being updated at all due to pressure of work or simple oversight. 

Given that many of these factors are down to human error, there is an opportunity for automated IT solutions to mitigate or remove such errors and data degradation.

The consequences of poor data quality are not hard to find, and nearly all users, across all lines of business, recognise them. 

Depending on your business, examples could include products being shipped to the wrong address, as the address updated in the sales system had not been synchronised with the logistics database, or the customer service desk being unable to respond quickly when a client calls due to information inaccuracies.

Another instance could be different internal systems providing a senior manager with different results, meaning that they must then spend additional time working out which one is correct. 

Such problems inevitably result in users spending more time on routine activities than would be needed if organisations had addressed their data quality issues. Perhaps more importantly, poor data quality can lead to organisations taking decisions based on inaccurate or out-of-date information, potentially with expensive consequences.

This makes it hard to understand why so few organisations implement formal processes to preserve data quality over time. There is no "silver bullet" IT solution which delivers data quality without effort, but there are many steps which IT and business managers are able to take to improve matters.

More on data quality

Seeking the truth

The starting point for improving data quality and integrity is deciding just what data source, or sources, should be regarded as holding the "truth" when considering any data record. This sounds simple. Alas, it is rarely straightforward, not least because so few organisations maintain all essential information in a single data store or database. 

Organisations also need to recognise that elements of internal politics may come into play. Thus, it is essential that IT consults with line of business managers to avoid the "My data is more important/complete/rich/valid/up to date than yours" discussion.

Once decisions have been made as to which sources are reliable, the next step is to create a high-level architecture that describes the mechanisms by which IT will maintain data quality as part of routine IT and business operations. This is quite different to occasional attempts to cleanse data sets.  

Data verification and auditing 

At heart, the objective is to build an IT environment which is supported by operational processes, where important data is verified and, where suitable, enriched at the time it is created, captured or updated. 

Poor data quality can lead to organisations taking decisions based on inaccurate or out-of-date information, potentially with expensive consequences

Tony Lock, Freeform Dynamics

Most data repositories will probably still benefit from periodic auditing, cross-referencing and validating to ensure that quality is maintained. The challenge is to make such processes both simple to undertake and cost effective. 

Fortunately, tools to improve data cleansing have advanced dramatically in recent years. In addition, there are service providers, including cloud suppliers, which specialise in this area. Equally important, best practice guidance is emerging.

In this context, remember that it is essential that each organisation considers whether all data requires the same level of cleansing and integrity maintenance. Given the fact that data cleansing and integrity checking involves effort and, therefore, costs, managers should bear in mind that not all data is of equal business value. Which data belongs in which category (and why) should be a business decision, not an IT one. 

Data is the lifeblood of many organisations. Impure data is like impure blood – not good for the system. If your organisation does not recognise the effects of entropy on data, it is time to step back and consider the possible consequences and costs. 

Your objective is to deliver appropriately accurate data quality on a continuous basis, rather via periodic data cleaning projects, which may leave you vulnerable between times.

Tony Lock (pictured) is programme director at analyst group Freeform Dynamics.



Read more on Managing IT and business issues