Data retrieval goes well beyond the scope of
email and database systems to embrace all types of records. While
databases contain structured data and email works with
semistructured data, general data retrieval must address the huge
majority of
unstructured data in the enterprise, including documents,
presentations and all types of media files.
Consequently, finding documents or unstructured records in the
datacentre has been likened to "finding a needle in a haystack,"
and this poses a serious dilemma for storage administrators
responsible for managing data, handling regulatory compliance
audits and meeting legal discovery requests. Document management
software provides many of the features found with email and
database archive tools, but several key features are essential for
unstructured data. This chapter explains indexing, reporting, data
analysis and policy management.
Understand the business needs According to a study conducted
by Xiotech Corp., about 90% of U.S. corporations are involved in
some amount of litigation -- often juggling multiple lawsuits at
any one time. In addition, noncompliance with government or
industry regulations can carry penalties, including hefty fines,
sanctions and even jail time. With such serious issues to contend
with, it's easy to make a business case for document/records
management or discovery tools. However, these tools can vary
dramatically in their complexity and features. It's crucial for
corporate stakeholders to understand their exposures and
liabilities, and then evaluate the features that are most relevant
for their industry or specific needs.
Index and search Large
enterprises can easily possess hundreds of millions of unstructured
files, making it practically impossible to locate specific data
using traditional filename or creation date information. Any
document/records management tool should have a very strong indexing
and search capability.
Retrieving data from archives, indexing
typically adds specific pieces of metadata to each file.
Metadata goes beyond the basic file system details and can
include a wide array of descriptive information that can easily
be searched -- it's all about preparing a file to be found at a
later date.. Comprehensive indexing is usually matched with an
equally powerful search capability. In most cases, searching
will sort through files based on previously created metadata.
For example, a typical search might look for files created by
"M. Smith" on "03 April" with "IPO" in the name or description.
However, search capabilities are increasingly contextual,
looking inside of files to locate important keywords. For
example, a storage administrator dealing with the Securities and
Exchange Commission (SEC) investigation into a brokerage firm
might search all documents from "M. Smith" during "2005"
containing words "promise," "guaranty" or "returns." With such a
potentially huge volume of records to examine, storage system
performance is also an important consideration. Performance
isn't just an issue with regular metadata searches, but it is
particularly notable with context searches within documents. Lab
testing and evaluation is strongly encouraged to gauge
performance and allow for performance tuning within the storage
infrastructure. Many types of document management software will
output data in a search engine-type format, such as Google, but
legal discovery software may also capture and deliver documents
to litigation-oriented software, such as CaseCentral,
Concordance, DB Textworks, Documatrix, Etech, Introspect, JFS
Litigator's Noteboook, Lextranet, Nmatrix, Ringtail, Summation
and Virtual Partner. Other tools specialize in organizing data
specifically for the Department of Justice (DOJ), the Federal
Trade Commission (FTC), NASD and SEC investigations.
Reporting and data analysis But searching isn't enough --
consider the reporting and analytical capabilities of your
document management software. You need to have an overview of
the data that is available, including age, type and value
details. This helps storage administrators get a view of the
data they're storing and its adherence to retention policies.
Workflow analysis and auditing capability should be able to
document file access and identify users that are interacting
with the organization's data. This can help to protect sensitive
data against unauthorized access and identify users that are
operating outside of corporate workflow policies. For example,
auditing can document users that delete files. Unauthorized
users can then be identified and corrected.
Policy management
and enforcement Today's glut of unstructured data is also
subject to corporate data retention and deletion, so document
management software must allow administrators to manage and
enforce policies so that each data type is retained for the
appropriate period. Retention periods will vary depending on the
data type and the industry. For example, patient records will be
retained far longer than a common corporate memo. Documents are
then securely deleted once the retention period has expired. Any
deletion should be properly documented to avoid accusations of
spoliation (destruction or alteration of evidence). If
litigation is a key concern, software should also provide a
"litigation hold" feature where relevant data is exempted from
deletion. Finally, it's important to note that retention
policies do not come from software. Instead, policies are set
through a comprehensive understanding of government and industry
regulations, along with a thorough knowledge of business
objectives and risk factors. No two businesses will necessarily
have the same retention policies for a given data type. Experts
suggest that it's easier to integrate document management
software in the enterprise when there is a well-defined and
established "paper" retention policy already in place.