Data is an asset that must be managed like any other in order to control costs and extract value.
How much is your company's data worth? It is a question to which there is no easy answer. No matter what your organisation does, its data is likely to have a high value. Companies that suffer a complete loss of data rarely continue to trade for long.
The University of Texas Center for Research on Information Systems found that 50% of companies that lose their data in a disaster never reopen, and 90% of the rest go out of business within two years of the event.
This is all very well, but it does not help much in putting a value on data. You can value data by estimating the cost of regularly gathering it, storing it in production systems and datawarehouses, analysing it, backing it up and so on; or you can value it in terms of what you spend on it, following the simple idea that the money is being spent because it needs to be - even if it is not being spent efficiently.
If you do a cursory analysis of the data assets of a company, you will find that the value of data varies. The data used by mission-critical systems is, by definition, high-value. The data in lesser systems has less value, the data on PCs perhaps less still. In practice, most of a company's valuable data is held in databases - after all, you would not spend good money on databases otherwise.
But data is not always valuable. As a rule of thumb, data becomes less valuable the older it gets. And if it is less valuable, then it is not worth spending a vast amount of money on. This brings us to the idea of information lifecycle management (ILM) - a phrase that has begun to crop up in IT in the past year or so.
It is worth understanding why ILM has gained currency. An avalanche of legislation in the US from 1999 (Gramm-Leach-Bliley, HIPAA, Sarbanes-Oxley, Homeland Security Act and even some state legislation) made companies liable for the security of their data. At the same time, data protection legislation in Europe tightened and suddenly everyone was concerned about protecting data.
The legislation is mostly concerned with protecting an individual's data to prevent identity theft and other forms of abuse, although Sarbanes-Oxley is an exception. It makes the chief executive and chief financial officer responsible for the accuracy of a company's financial information. This US legislation is having a worldwide impact.
Organisations are beginning to realise they have a legal obligation in respect of at least some of the data they hold in their databases. Also, for many organisations data storage costs are out of control as the amount of information they hold grows dramatically.
ILM is, as the diagram above suggests, a policy which feeds into management and security procedures which in turn govern both applications and data. An important point to note is that applications and their data cannot be separated here as the value of data naturally derives from its use. This makes ILM easier to implement for data that is held in databases - since they record what happens to data and organise it coherently.
Any organisation that wants to address ILM in a thorough way needs to treat it as a project, involving:
- Analysis and classification - where an organisation gathers information on its "universe of data", including details of where it resides, and analyses its usage to determine which data is valuable and why. This involves considering how applications create, change and delete data and how it is stored.
- Classification - where an organisation classifies its data according to its valuation of it and maps the lifecycle of "data pools" from creation to permanent archive or eventual deletion.
- Determination of policy - deciding the policies to apply to managing the data in terms of speed of availability, security, replication, who should have access rights, back-up and recovery.
- Automation - having determined lifecycle policy, automation is a matter of implementing associated procedures, which is best done by being able to control it directly from policy.
Finally, once the policy has been built a team needs to be responsible for it and for conducting regular reviews as new data pools are added or the corporate use of data changes.
ILM is not a technology but a process. Some of the capability needed to implement it, or at least gather information about it, is already provided by databases and storage resource management technology. Many information lifecycle policies in respect of back-up, disaster recovery, security and access permissions are already implemented.
ILM is in its infancy and I doubt if any organisation can claim to have implemented anything coherent yet.
Typically there are multiple copies and versions of data, even when it is held in databases. The cost of managing and storing data is not related back to the value of the data in any meaningful way. Important data pools have no value assigned to them, and there is no coherent theory yet of how you decide what constitutes a pool of data.
Most data is not self-identifying - it does not carry explicit information that identifies what it is. Also there is no explicit audit trail of data usage. Database log files provide only limited information.
But business benefits from ILM go beyond legislative compliance and back-up and recovery mechanisms. It provides a framework for the efficient management of data, for making it available securely and in a timely manner, and for tracking its usage.
The cost of managing data is high, but data is often inadequately secured and formal polices for its authorised use and proper management often do not exist. This needs to change.
Robin Bloor is chief executive of Bloor Associates