The job of ensuring data quality is often seen as the sole responsibility of data management professionals in the IT department. But most data quality problems begin where data is generated – on the business side of an organization. A procurement professional creates an extra copy of an invoice. A customer service representative updates the wrong customer file. Those mistakes are then passed down the line to other employees who rely on high levels of data quality to do their jobs successfully.
One of the best ways to avoid problems like these is to weave the responsibility for data quality into the roles of all workers, business and IT professionals alike, says Danette McGilvray, the president of Granite Falls Consulting Inc. McGilvray, who is also the author of Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, says that getting everyone to take responsibility for data quality requires organizations to train employees on how individual data quality choices can lead to problems for others.
SearchDataManagement.com recently caught up with McGilvray to talk about how organizations can begin the process of putting the responsibility for data quality squarely on everybody’s shoulders. Here are some excerpts from that conversation, which have been edited for clarity.
How do people on the business side of an organization tend to look at the issue of data quality?
People often look at data quality and say, ‘yes, yes, that is important, but we’re over here doing our project, our process or our business function. You guys go over there and work with your data quality and we’ll catch up with you in a few months.’
It’s often seen as something quite separate and nice to have, yet somehow, there is this idea that it’s really not part of what they’re doing from a business point of view.
Can you give an example of how a worker from the business side of an organization might cause data quality problems?
Let’s say that we’ve done some assessment on our data, we have some problems, and we find out that one of the root causes [for data quality problems] is the people who are actually talking to folks on the phone. They’re a main source of information coming in [but] they really have not been trained at all in some of the things that they need to do that will help prevent data quality problems. [Nor have they been trained about] all the impacts down the line on the other people who use that information.
How should an organization begin the process of expanding the responsibility for data quality beyond IT?
The starting point is to understand your organization and the various roles and responsibilities that have the most impact on data quality. From there create a plan to develop and deliver relevant presentations, training, and other communications that are appropriate for the various audiences. Then do it
[Another aspect involves helping non-IT workers to] understand that data quality is really a very integral part of what they need to have for success. Any business function, process, decision and report … has an aspect of information to it and [IT workers] want to be able to trust that information to complete whatever [they] do. Data quality is an integral part of [their] success.
What’s the next step?
It can be something as simple as [creating] a module for training that is maybe about 15 minutes … because people in that customer service group, for example, are going to get training anyway. Can we integrate that data quality training as a natural part of that training that they have to do anyway? If so, then it’s going to be much easier. They’ll see it as part of their job and not something special. Not only do we tell them what it is we need them to do, we give them a perspective of [how the data they generate] goes to other places in the company.
Your book discusses the idea that an appreciation for the information life cycle is a key part of the framework for solving data quality problems. First off, how do you define the information life cycle and what is POSMAD?
I put that acronym together as a way to help people remember the high-level phases of the information life cycle. ‘P’ is for plan. We need to plan [for handling data]. ‘O’ is for obtain the data. … ‘S’ is for store and share. When that data comes in, it goes somewhere, maybe multiple places. … ‘M’ is for maintain. We update a record, we correct it and we find that there are duplicates. ‘A’ is for apply. That is where the real value comes in because we do everything else so that we can apply the data and use it some way. For example, we complete a transaction, we run a report or make a decision from that report and then take action. [And then ‘D’ is for] dispose. Archiving is an example of an activity that would happen in the dispose phase of the information life cycle.
How can understanding of the information life cycle help organizations combat data quality problems?
The whole idea of the information life cycle is very fundamental to data quality. Once a person understands that, it is helpful even if somebody just sends an issue ticket into a service desk that seems to have something to do with the data quality. You can immediately start saying: ‘OK, where in the lifecycle is this problem? What happened before that? What was it meant to do?’ And you can use that as a way to start doing some tracing and at least start doing some root cause analysis.