Clean data, clean profit

Big cost savings and improved return on investment are just two of the benefits of cleaning up your data and making sure it stays...

Big cost savings and improved return on investment are just two of the benefits of cleaning up your data and making sure it stays that way. Sally Whittle reports

British troops in Iraq cannot survive without their supplies: everything from boots to basic rations are shipped out from the UK to the Gulf. In a typical month, soldiers in Iraq receive 25,000 pallets of goods sent from warehouses in the UK.

The Ministry of Defence is responsible for ensuring that troops receive the right supplies at the right time. That is no easy feat when the IT infrastructure consists of three separate supply chain systems that pull data from 850 different applications.

To make matters worse, each system handles data differently. A stock code might refer to a cold-climate ration pack in one application but denote a car battery in another. This raises the question: how do staff know whether the order numbers being processed are correct?

Data quality

And data quality is an issue for every IT director, says Ian Charlesworth, a senior research analyst with Butler Group. "Absolutely no organisation can stand up and say their data is 100% accurate. That's never going to happen," he says. "The key is to identify which data really needs to be accurate, and how you can achieve that."

In the case of the MoD, it is vital to ensure that stock numbers are consistent across all applications. The government recently invested in data quality software from Trillium Software to reconcile UK and Nato stock codes in the supply chain.

Data quality software used only to be found in marketing departments where it was used to tidy up mailing lists. However, organisations have started to realise the importance of data quality, says Simon Gay, a consultancy practice leader at Computacenter. "Problems with inconsistent data has compliance implications, particularly for financial services companies, which need to ensure they comply with counter-terrorism measures," he says.

The Carphone Warehouse recently completed a 12-month data quality project focusing on improving the data in its customer relationship management system. The company needed this system to help to provide a single view of its customers but the project was undermined by unreliable data.

Information was collected and stored in three different systems depending on whether customers bought a mobile phone, insurance or a landline. Some customers might appear in two or three of these silos, but poor data quality made it difficult to confirm, says Robert Kent, CRM programme manager at the Carphone Warehouse. "We couldn't be confident in our data, so we couldn't be confident in marketing to these customers," he says.

An audit in September 2003 revealed missing information, duplication and inconsistency in almost 10% of the customer data. "The most common problems were incorrect telephone numbers or missing e-mail addresses. It was not anything catastrophic, but we were talking to customers without the full picture," he says.

Human error

The Carphone Warehouse's experience is not unusual. On average, 20% of corporate data is incorrect, incomplete or inconsistent, says Charlesworth. Most errors are human error: a sales clerk taking the wrong customer address, for example, or an accounts clerk adding an extra zero. Other problems can be created when data is shared between applications that read information differently. Despite this, Butler Group estimates that fewer than one in 10 companies routinely check their data quality.

One explanation for this is that organisations do not realise what the cost of bad data is. "Companies must wake up to the real cost of poor data quality and the opportunity to reduce costs by improving their data," says Andreas Bitterer, a vice-president of research with Meta Group.

The most common fall-out from bad data is reduced return on investment for enterprise applications, he says. For example, a CRM system may enable your company to send catalogues to 10,000 potential customers. But if 10% of the addresses are incorrect and another 15% are duplicated, you could end up wasting thousands of pounds. "It's the old saying of rubbish in, rubbish out," he says. "Not to mention the fact that sending two separate catalogues to Andy Smith and Andrew Smith just makes you look incompetent."

The first step in addressing a data quality problem is accepting you have a problem, and it needs to be addressed. While this might seem self-evident to IT directors, chief executives might need a little persuasion, says Bitterer. "The problem is hidden from the business to a certain extent, and it is difficult to quantify the costs," he says. ""But if nothing else, improving data quality will reduce your administration costs and will improve compliance and corporate governance."

The next step is to find out just how bad your data quality problem is. There are a number of data audit and data profiling tools on the market from suppliers such as Trillium, First Logic and Datanomic. All of these tools use logic or artificial intelligence to search through data for inconsistencies or null values. Some tools may also use pattern recognition technology to deduce that a 22-year-old Clare Smith at 1 King Street and 22-year-old Claire Smith at the same address are probably the same person.

Clean up your act

Once you have identified the problem, software tools are available to clean or to enrich your data. Data cleansing tools will remove duplication, find errors and complete missing numbers or letters in many cases. The software will also highlight any data problems that it has not been able to resolve. For example, there may be empty fields or dates that could be read several ways.

This data will need to be resolved manually, which usually means involving business analysts, says Bitterer. "The IT department will not know whether a customer reference number should relate to an order number, or that invoices are sent to this customer on this date in the month," he says. "If you treat data cleansing as just an IT problem, you won't get very far."

The Carphone Warehouse used data cleansing software to remove inconsistencies from its CRM application and the three underlying databases. However, that was only the beginning of the project. Since cleaning the data, the company has conducted monthly scans to report on the state of its CRM data, and rate its accuracy.

"We spent quite a lot of money enhancing data, buying new e-mail addresses and so on. It's important for us to maintain that quality," says Kent.

Data cleansing should not be seen as a project with a start- and end-point, adds Bitterer. "Probably 75% of companies have done a data audit at some point, but just performing a check once is pointless," he says. "You need to have someone in the organisation who is responsible for ensuring that spot checks are done and the underlying causes of the data quality problem are addressed."

The underlying cause of the data problems at the Carphone Warehouse turned out to be a company policy of giving staff incentives to collect e-mail addresses and home phone numbers from customers. "In many cases, staff were just putting in gibberish numbers or a series of ones and zeros," says Kent.

To prevent the problem happening again, he persuaded managers to change the incentive scheme to dissuade sales staff from entering false information. At the same time, the CRM application was tweaked so that invalid e-mail addresses and phone numbers would not be accepted.

The major weakness of any data cleansing solution is that it might disguise an underlying problem, adds Charlesworth. "It's easy to spend £100,000 and go back a year later to find things are back to square one. The processes or the applications themselves might be causing the problem, and data cleansing won't touch those.

"The benefits of improved data quality can be enormous, but unless you address the business issues, they will only ever be temporary," he says.

Case study: Amec saves up to £1m a project with data cleanser   

Building an oil and gas platform usually takes Amec Oil and Gas three years and involves thousands of individual jobs.

During a single project, engineers working on an oil rig will generate 3 million data records and 80,000 documents. This information, which includes everything from the materials used in individual valves to sophisticated application codes, must be presented to the customer at the end of the project.  

The problem for Amec is finding and presenting that data accurately. "Engineers are not the most patient of people, and they don't care too much how we've asked them to enter information," says Peter Mayhew, information manager at the company.  

For example, engineers entering maximum temperatures will often simply enter a number, without a unit of measurement. "They know what it means, maybe the customer knows what it means, but the computer doesn't," he adds.  

When Amec first identified the data quality problem, it tried to solve the problem manually. A team of staff built an ad-hoc system in Access and attempted to pull data from all relevant systems and reconcile it.

"We have 350 different systems and the number of interfaces was horrifying; you can imagine how well that went," says Mayhew. "We struggled on for a couple of years, but it was obvious we just couldn't cope."  

Instead, Amec worked with data quality specialist Datanomic to develop a bespoke information management system called Orbiss, which can automatically clean, store and manage data through a project, offering vastly improved data quality.  

Mayhew estimates that the software saves the company up to £1m per project by removing the need to manually correct data. "It used to be that we  would either spend months cleaning data ourselves or we would hand it over to the customer to sort out," he says.

Four key steps to cleaning up dirty data     

  • Audit: use a data profiling or auditing tool to identify the type and location of data defects  
  • Clean: use a data cleansing tool to clean data, remove errors and fix basic problems  
  • Prevention: if possible, use real-time safeguards to prevent new errors entering the system  
  • Compliance: appoint a data steward to be responsible for long-term monitoring, measurement and management of data quality.

Read more on IT risk management