Clean data, clean profit
- Posted:
- 15:28 23 Sep 2004
- Topics:
- Customer Management
Big cost savings and improved return on
investment are just two of the benefits of cleaning up your data
and making sure it stays that way. Sally Whittle
reports
British troops in Iraq cannot survive without their supplies:
everything from boots to basic rations are shipped out from the UK
to the Gulf. In a typical month, soldiers in Iraq receive 25,000
pallets of goods sent from warehouses in the UK.
The Ministry of Defence is responsible for ensuring that troops
receive the right supplies at the right time. That is no easy feat
when the IT infrastructure consists of three separate supply chain
systems that pull data from 850 different applications.
To make matters worse, each system handles data differently. A
stock code might refer to a cold-climate ration pack in one
application but denote a car battery in another. This raises the
question: how do staff know whether the order numbers being
processed are correct?
Data quality
And data quality is an issue for every IT director, says Ian
Charlesworth, a senior research analyst with Butler Group.
"Absolutely no organisation can stand up and say their data is 100%
accurate. That's never going to happen," he says. "The key is to
identify which data really needs to be accurate, and how you can
achieve that."
In the case of the MoD, it is vital to ensure that stock numbers
are consistent across all applications. The government recently
invested in data quality software from Trillium Software to
reconcile UK and Nato stock codes in the supply chain.
Data quality software used only to be found in marketing
departments where it was used to tidy up mailing lists. However,
organisations have started to realise the importance of data
quality, says Simon Gay, a consultancy practice leader at
Computacenter. "Problems with inconsistent data has compliance
implications, particularly for financial services companies, which
need to ensure they comply with counter-terrorism measures," he
says.
The Carphone Warehouse recently completed a 12-month data quality
project focusing on improving the data in its customer relationship
management system. The company needed this system to help to
provide a single view of its customers but the project was
undermined by unreliable data.
Information was collected and stored in three different systems
depending on whether customers bought a mobile phone, insurance or
a landline. Some customers might appear in two or three of these
silos, but poor data quality made it difficult to confirm, says
Robert Kent, CRM programme manager at the Carphone Warehouse. "We
couldn't be confident in our data, so we couldn't be confident in
marketing to these customers," he says.
An audit in September 2003 revealed missing information,
duplication and inconsistency in almost 10% of the customer data.
"The most common problems were incorrect telephone numbers or
missing e-mail addresses. It was not anything catastrophic, but we
were talking to customers without the full picture," he says.
Human error
The Carphone Warehouse's experience is not unusual. On average,
20% of corporate data is incorrect, incomplete or inconsistent,
says Charlesworth. Most errors are human error: a sales clerk
taking the wrong customer address, for example, or an accounts
clerk adding an extra zero. Other problems can be created when data
is shared between applications that read information differently.
Despite this, Butler Group estimates that fewer than one in 10
companies routinely check their data quality.
One explanation for this is that organisations do not realise what
the cost of bad data is. "Companies must wake up to the real cost
of poor data quality and the opportunity to reduce costs by
improving their data," says Andreas Bitterer, a vice-president of
research with Meta Group.
The most common fall-out from bad data is reduced return on
investment for enterprise applications, he says. For example, a CRM
system may enable your company to send catalogues to 10,000
potential customers. But if 10% of the addresses are incorrect and
another 15% are duplicated, you could end up wasting thousands of
pounds. "It's the old saying of rubbish in, rubbish out," he says.
"Not to mention the fact that sending two separate catalogues to
Andy Smith and Andrew Smith just makes you look incompetent."
The first step in addressing a data quality problem is accepting
you have a problem, and it needs to be addressed. While this might
seem self-evident to IT directors, chief executives might need a
little persuasion, says Bitterer. "The problem is hidden from the
business to a certain extent, and it is difficult to quantify the
costs," he says. ""But if nothing else, improving data quality will
reduce your administration costs and will improve compliance and
corporate governance."
The next step is to find out just how bad your data quality problem
is. There are a number of data audit and data profiling tools on
the market from suppliers such as Trillium, First Logic and
Datanomic. All of these tools use logic or artificial intelligence
to search through data for inconsistencies or null values. Some
tools may also use pattern recognition technology to deduce that a
22-year-old Clare Smith at 1 King Street and 22-year-old Claire
Smith at the same address are probably the same person.
Clean up your act
Once you have identified the problem, software tools are
available to clean or to enrich your data. Data cleansing tools
will remove duplication, find errors and complete missing numbers
or letters in many cases. The software will also highlight any data
problems that it has not been able to resolve. For example, there
may be empty fields or dates that could be read several ways.
This data will need to be resolved manually, which usually means
involving business analysts, says Bitterer. "The IT department will
not know whether a customer reference number should relate to an
order number, or that invoices are sent to this customer on this
date in the month," he says. "If you treat data cleansing as just
an IT problem, you won't get very far."
The Carphone Warehouse used data cleansing software to remove
inconsistencies from its CRM application and the three underlying
databases. However, that was only the beginning of the project.
Since cleaning the data, the company has conducted monthly scans to
report on the state of its CRM data, and rate its accuracy.
"We spent quite a lot of money enhancing data, buying new e-mail
addresses and so on. It's important for us to maintain that
quality," says Kent.
Data cleansing should not be seen as a project with a start- and
end-point, adds Bitterer. "Probably 75% of companies have done a
data audit at some point, but just performing a check once is
pointless," he says. "You need to have someone in the organisation
who is responsible for ensuring that spot checks are done and the
underlying causes of the data quality problem are addressed."
The underlying cause of the data problems at the Carphone Warehouse
turned out to be a company policy of giving staff incentives to
collect e-mail addresses and home phone numbers from customers. "In
many cases, staff were just putting in gibberish numbers or a
series of ones and zeros," says Kent.
To prevent the problem happening again, he persuaded managers to
change the incentive scheme to dissuade sales staff from entering
false information. At the same time, the CRM application was
tweaked so that invalid e-mail addresses and phone numbers would
not be accepted.
The major weakness of any data cleansing solution is that it might
disguise an underlying problem, adds Charlesworth. "It's easy to
spend £100,000 and go back a year later to find things are
back to square one. The processes or the applications themselves
might be causing the problem, and data cleansing won't touch
those.
"The benefits of improved data quality can be enormous, but unless
you address the business issues, they will only ever be temporary,"
he says.
Case study: Amec saves up to £1m a project with data cleanser
Building an oil and gas platform usually takes Amec Oil and Gas three years and involves thousands of individual jobs.
During a single project, engineers working on an oil rig will generate 3 million data records and 80,000 documents. This information, which includes everything from the materials used in individual valves to sophisticated application codes, must be presented to the customer at the end of the project.
The problem for Amec is finding and presenting that data accurately. "Engineers are not the most patient of people, and they don't care too much how we've asked them to enter information," says Peter Mayhew, information manager at the company.
For example, engineers entering maximum temperatures will often simply enter a number, without a unit of measurement. "They know what it means, maybe the customer knows what it means, but the computer doesn't," he adds.
When Amec first identified the data quality problem, it tried to solve the problem manually. A team of staff built an ad-hoc system in Access and attempted to pull data from all relevant systems and reconcile it.
"We have 350 different systems and the number of interfaces was horrifying; you can imagine how well that went," says Mayhew. "We struggled on for a couple of years, but it was obvious we just couldn't cope."
Instead, Amec worked with data quality specialist Datanomic to develop a bespoke information management system called Orbiss, which can automatically clean, store and manage data through a project, offering vastly improved data quality.
Mayhew estimates that the software saves the company up to £1m per project by removing the need to manually correct data. "It used to be that we would either spend months cleaning data ourselves or we would hand it over to the customer to sort out," he says.
Four key steps to cleaning up dirty data
- Audit: use a data profiling or auditing tool to identify the type and location of data defects
- Clean: use a data cleansing tool to clean data, remove errors and fix basic problems
- Prevention: if possible, use real-time safeguards to prevent new errors entering the system
- Compliance: appoint a data steward to be responsible for long-term monitoring, measurement and management of data quality.