Using real business data to test new applications is vital, but take precautions, says David Chalmers
Testing is a vital stage in the development of an application. Businesses have to be sure they have given their applications as realistic a test as possible. Hence testing uses real, live data from databases. This would normally be in the hands of an IT department or an outsourcing partner.
For example, a bank's IT team might use actual account numbers, sort codes, addresses and names in testing a new CRM system. Or a supplier to a local authority may use real, complete, electoral registers to test electronic voting systems.
This exposes sensitive data to real threats. In an era of heightened sensitivity towards data privacy, the testing phase itself may become a liability.
Businesses are exposing themselves to breaches of data protection and customer privacy laws. Their level of risk increases with the type of testing that they are required to undertake.
This risk comes from the need to replicate data to be tested. Creating copies of a company database puts that data at risk. For example, a disgruntled or dishonest employee seeking to damage corporate reputation or steal company data could exploit data that is often stored unprotected as "test" copies.
Using dummy data is simply not a viable option because this replacement data lacks sufficient complexity. Some databases will contain information that is thousands of links long, with each link, cross-reference and piece of information being required. Dummy data cannot match that complexity.
To tackle this quandary, organisations must enable their application testing departments to retain the integrity of test data without exposing its content or compromising its privacy. The answer lies in a focus on the data itself.
It is possible to use real data and make extensive modifications without compromising the value of the information to the testing process. So the real data is used, but it is "masked" so that no one can see or access the actual information.
There are currently two approaches to the creation of masked data. Data can either be replicated in a two-step process, in which a production copy of the original data is set up and then made into a modified version suitable for testing. The risk of this approach is that there is a time when you have produced a complete copy of the data before it is modified, which could be hacked or leaked out of the organisation.
A safer alternative is to copy data directly into a modified and masked format. For example, customer names may be changed. The resulting masked data means that, should the data be exposed, it is worthless. But it still satisfies all relevant criteria for testing.
David Chalmers is product strategy director at software supplier Macro 4