The Freedom of Information Act and other laws are putting pressure on companies to supply data quickly or suffer penalties. Jessica Twentyman reports on rapid data retrieval
The IT team at UK-based holding company Centrica faces some tough deadlines when it comes to retrieving archived data, saysEamonn Forde, the company's storage back-up team manager. "We know that we may be asked to produce auditable evidence of various business transactions. But, at the same time, we don't want our employees to spend hours of time searching for that evidence," he says.
As the parent company of British Gas, One.Tel and the AA, Centrica operates in a number of highly regulated industries. It is not alone - many UK-based IT organisations are in a similar situation. Not only must they store huge volumes of corporate data in order to comply with government legislation,industry regulations and internal corporate governance policies - but they must also be able to retrieve that data on demand, frequently at short notice.
Under the terms of the Data Protection Act, members of the public can demand to see all data about them held by a company. Such a 'subject access request' must be satisfied by the company within 40 calendar days of it receiving the demand.
Public sector bodies must also shoulder the burden of the Freedom of Information Act. From January 2005, government organisations are obliged to make available any information that they hold, with some exceptions, to members of the public within 20 working days of a request. If they fail to meet this deadline,individuals can complain to the Office of the Information Commissioner, a regulator with sweeping powers to impose fines and even short prison sentences for non-compliance.
There are many other examples. Under the UK Companies Act, organisations must be able to retrieve accounting records dating back six years if called on to do so by company liquidators, inspectors appointed under the Companies Act, Financial Services Authority investigators, the Serious Fraud Office, and representatives from the Inland Revenue and Customs and Excise.
Clearly organisations face a number of challenges if they are to comply with all legal requirements for data access. These are compounded by the variety of formats in which that data may reside: paper files, scanneddocuments, e-mails and databases. As a result, many hours of company time can be swallowed up as employees abandon routine tasks in order to search for information.
Take just one example: e-mails. IT market research company the Butler Group last month published the findings of a survey revealing that of 100 IT directors in UK-based companies, 47% would not be able to retrieve an e-mail more than three years old. In thefinancial services sector - where e-mails must be kept for six years - that figure was 25%.
The repercussions of this could be huge, says the Butler report. "Pharmaceuticals firm Ciba-Geigy was forced to search through 30 million e-mails for a court case, [even] after arguing that the process would be too onerous and time-consuming. In many cases, organisations have preferred to pay a fine rather than search through millions of e-mails."
Faced with the unpalatable choice between searching through files or paying a huge fine, many organisations are seeking new ways to store data so that it can be retrieved at short notice. In response, the technology industry is launching new products which it claims will enable customers to do precisely that.
Such promises should not be taken at face value, warns Simon Gay, consultancy practice leader at IT services company Computacenter. "The link between technology and compliance is far too overstated in my opinion," he says. "Laws and regulations apply regardless of where you store data. What's important is to have clear policies in place so that, when presented with a demand, you know not only whether you have it or not, but also exactly where to look for it." He knows of one company which fell foul of this basic rule: it ended up settling a legal case with a former employee - at considerable expense - because it could not locate information that it knew it had.
Paper documents present a particular problem, not only because they take up physical space but because searching through them is a manual process, which can be costly and time-consuming. For that reason, many companies now use barcodes to track paper documents.
In future, RFID tags may be used to perform similar tasks. A document or file bearing an RFID tag can 'call out' to a RFID reader, indicating where it can be found. However, the relatively high cost of RFID tags at present means they are not yet commercially viable.
Scanned documents, meanwhile, enable organisations to avoid the space problem.Indexing tools provided by document capture software companies such as Captiva enable users to index - or tag - scanned images and assign them to specific, searchable files within a document imaging system.
Digital information raises other problems, partly because of the increasing amount of it being stored, but also as a result of changing IT architectures. "Some years ago, data relating to a particular application or system could be found, in most cases, in storage systemsdirectly attached to that application. With the rise of storage area networks and network-attached storage architectures, that is no longer the case," says Andy Mulholland, global chief technology officer of IT services company Capgemini.
This, he says, is why the industry is seeing a rapid convergence of storage hardware and enterprise content management software - embodied by the acquisition of ECM software company Documentum by storage systemsgiant EMC in October 2003.
On the hardware side, EMC offers Centera, a content-addressed storage system designed to give users quick access to e-mails, spreadsheets, video, still images, Cad/Cam designs and so on. Storage supplier Network Appliance has also embraced content-addressed storage, although IBM and Hewlett-Packard are focussing their attention on the related field of information lifecycle management.
Content-addressed storage is object-oriented, in that it treats each document as aseparate entity. It also uses disc-based Serial ATA technology, which is more easily searchable than removable media such as tape, but less expensive than disc-based technologies.
When a user creates a document, the application server sends it to the Cas system. The storage system then returns a unique content address to the server. From that point, theapplication can request the document by submitting the address. The Cas system thenretrieves the document, regardless of where it may be physically located.
This offers powerful retrieval speeds. "If a bank is looking to retrieve 100,000 e-mails from files in its digital store, with the average e-mail size being 200Kbytes including attachments, retrieval using an online Cas system is much quicker," says Mark Lewis, regional marketing manager at EMC. In this scenario, he claims, it can take on average up to 556 hours [23 days] to retrieve an e-mail archived on tape and 278 hours [12 days] on an optical storage infrastructure. Using Centera, he says, an e-mail can be found in as little as 56 minutes because the system stores and retrieves documents using meta data and is able to search at file-level.
Centera has provided an answer for Forde's team at Centrica, where much of the data it holds on its millions of customers and customer transactions is held in an integrated SAP/Siebel system and in e-mails. Much of that data, he explains, must be archived for rapid retrieval. "These systems are growing by the day. We had to act, otherwise data management and retrieval would quickly have become impossible," Forde says.
For many other companies, however, that situation is fast approaching - if it hasn't arrived already.
Andy Mulholland, Capgemini's global chief technology officer, is speaking on thinking beyond back-up at Storage Expo on 13 October
This article is part of Comptuer Weekly's Special Report on storage produced in association with Cisco Systems