Don't be an outlaw
- Posted:
- 11:16 04 Oct 2004
- Topics:
The Freedom of Information Act and
other laws are putting pressure on companies to supply data quickly
or suffer penalties. Jessica Twentyman reports on rapid data
retrieval
The IT team at UK-based holding company Centrica faces some tough
deadlines when it comes to retrieving archived data, saysEamonn
Forde, the company's storage back-up team manager. "We know that we
may be asked to produce auditable evidence of various business
transactions. But, at the same time, we don't want our employees to
spend hours of time searching for that evidence," he says.
As the parent company of British Gas, One.Tel and the AA, Centrica
operates in a number of highly regulated industries. It is not
alone - many UK-based IT organisations are in a similar situation.
Not only must they store huge volumes of corporate data in order to
comply with government legislation,industry regulations and
internal corporate governance policies - but they must also be able
to retrieve that data on demand, frequently at short notice.
Under the terms of the Data Protection Act, members of the public
can demand to see all data about them held by a company. Such a
'subject access request' must be satisfied by the company within 40
calendar days of it receiving the demand.
Public sector bodies must also shoulder the burden of the Freedom
of Information Act. From January 2005, government organisations are
obliged to make available any information that they hold, with some
exceptions, to members of the public within 20 working days of a
request. If they fail to meet this deadline,individuals can
complain to the Office of the Information Commissioner, a regulator
with sweeping powers to impose fines and even short prison
sentences for non-compliance.
There are many other examples. Under the UK Companies Act,
organisations must be able to retrieve accounting records dating
back six years if called on to do so by company liquidators,
inspectors appointed under the Companies Act, Financial Services
Authority investigators, the Serious Fraud Office, and
representatives from the Inland Revenue and Customs and
Excise.
Clearly organisations face a number of challenges if they are to
comply with all legal requirements for data access. These are
compounded by the variety of formats in which that data may reside:
paper files, scanneddocuments, e-mails and databases. As a result,
many hours of company time can be swallowed up as employees abandon
routine tasks in order to search for information.
Take just one example: e-mails. IT market research company the
Butler Group last month published the findings of a survey
revealing that of 100 IT directors in UK-based companies, 47% would
not be able to retrieve an e-mail more than three years old. In
thefinancial services sector - where e-mails must be kept for six
years - that figure was 25%.
The repercussions of this could be huge, says the Butler report.
"Pharmaceuticals firm Ciba-Geigy was forced to search through 30
million e-mails for a court case, [even] after arguing that the
process would be too onerous and time-consuming. In many cases,
organisations have preferred to pay a fine rather than search
through millions of e-mails."
Faced with the unpalatable choice between searching through files
or paying a huge fine, many organisations are seeking new ways to
store data so that it can be retrieved at short notice. In
response, the technology industry is launching new products which
it claims will enable customers to do precisely that.
Such promises should not be taken at face value, warns Simon Gay,
consultancy practice leader at IT services company Computacenter.
"The link between technology and compliance is far too overstated
in my opinion," he says. "Laws and regulations apply regardless of
where you store data. What's important is to have clear policies in
place so that, when presented with a demand, you know not only
whether you have it or not, but also exactly where to look for it."
He knows of one company which fell foul of this basic rule: it
ended up settling a legal case with a former employee - at
considerable expense - because it could not locate information that
it knew it had.
Paper documents present a particular problem, not only because they
take up physical space but because searching through them is a
manual process, which can be costly and time-consuming. For that
reason, many companies now use barcodes to track paper
documents.
In future, RFID tags may be used to perform similar tasks. A
document or file bearing an RFID tag can 'call out' to a RFID
reader, indicating where it can be found. However, the relatively
high cost of RFID tags at present means they are not yet
commercially viable.
Scanned documents, meanwhile, enable organisations to avoid the
space problem.Indexing tools provided by document capture software
companies such as Captiva enable users to index - or tag - scanned
images and assign them to specific, searchable files within a
document imaging system.
Digital information raises other problems, partly because of the
increasing amount of it being stored, but also as a result of
changing IT architectures. "Some years ago, data relating to a
particular application or system could be found, in most cases, in
storage systemsdirectly attached to that application. With the rise
of storage area networks and network-attached storage
architectures, that is no longer the case," says Andy Mulholland,
global chief technology officer of IT services company
Capgemini.
This, he says, is why the industry is seeing a rapid convergence of
storage hardware and enterprise content management software -
embodied by the acquisition of ECM software company Documentum by
storage systemsgiant EMC in October 2003.
On the hardware side, EMC offers Centera, a content-addressed
storage system designed to give users quick access to e-mails,
spreadsheets, video, still images, Cad/Cam designs and so on.
Storage supplier Network Appliance has also embraced
content-addressed storage, although IBM and Hewlett-Packard are
focussing their attention on the related field of information
lifecycle management.
Content-addressed storage is object-oriented, in that it treats
each document as aseparate entity. It also uses disc-based Serial
ATA technology, which is more easily searchable than removable
media such as tape, but less expensive than disc-based
technologies.
When a user creates a document, the application server sends it to
the Cas system. The storage system then returns a unique content
address to the server. From that point, theapplication can request
the document by submitting the address. The Cas system
thenretrieves the document, regardless of where it may be
physically located.
This offers powerful retrieval speeds. "If a bank is looking to
retrieve 100,000 e-mails from files in its digital store, with the
average e-mail size being 200Kbytes including attachments,
retrieval using an online Cas system is much quicker," says Mark
Lewis, regional marketing manager at EMC. In this scenario, he
claims, it can take on average up to 556 hours [23 days] to
retrieve an e-mail archived on tape and 278 hours [12 days] on an
optical storage infrastructure. Using Centera, he says, an e-mail
can be found in as little as 56 minutes because the system stores
and retrieves documents using meta data and is able to search at
file-level.
Centera has provided an answer for Forde's team at Centrica, where
much of the data it holds on its millions of customers and customer
transactions is held in an integrated SAP/Siebel system and in
e-mails. Much of that data, he explains, must be archived for rapid
retrieval. "These systems are growing by the day. We had to act,
otherwise data management and retrieval would quickly have become
impossible," Forde says.
For many other companies, however, that situation is fast
approaching - if it hasn't arrived already.
Andy Mulholland, Capgemini's global chief technology officer,
is speaking on thinking beyond back-up at Storage Expo on 13
October
www.storage-expo.com
This article is part of Comptuer Weekly's Special Report on storage produced in association with Cisco Systems