There is no single uniform approach to disaster planning or
recovery. Each organization must establish plans and implement
tools that are appropriate for its particular business model and
compliance obligations. Regardless of your specific approach,
however, disaster planning is not a one-time academic exercise. In
actual practice, Disaster recovery (DR) plans often necessitate
changes in the storage infrastructure and impose other overhead
tasks that must be addressed. DR plans must also be tested and
updated periodically to ensure that disaster plans remain relevant
as the business grows or hardware changes. Let's take a look at the
most pressing DR management issues.
Implementation considerations
Disaster recovery plans typically involve changes to an existing
storage or network infrastructure. Ultimately, a storage
administrator must budget and schedule the hardware, software,
labor and facility costs needed to accommodate DR plans. Hardware
additions may be as simple as adding a tape drive or tape library
but often requires more substantial additions like dedicated
storage systems. One example might be the acquisition of a NearStor
virtual tape library from Network Appliance Inc. or an Axion
backup/recovery system from Avamar Technologies Inc.
In most cases, backups intended for DR purposes are sent to a
remote location. Services like Iron Mountain Inc. can transport
physical tapes to a secure off-site vault, but an increasing number
of organizations are practicing remote replication between storage
systems at two or more locations. For example, a bank may use a WAN
link to replicate data from one EMC Corp. Centera in its main data
center to a secondary Centera located in a backup data center
across the state.
DR doesn't work without software and usually involves one or more
software applications, such as backup, snapshot, mirroring or
replication tools. Some examples include EMC's Symmetrix Remote
Data Facility software designed to replicate Symmetrix systems, as
well as Avamar's Replicator software intended to replicate
heterogeneous systems across a WAN. Whether software is bundled
with the storage system or acquired separately, an IT staff must
invest the time to become proficient with each tool. Smart managers
will ensure that key IT personnel have the time to learn each
tool.
Once the DR infrastructure is in place, it takes a serious effort
to establish and maintain the backup. This may require an evening
or weekend to make full backup tapes or synchronize data between
replication sites across a WAN. After the initial replication, an
IT department must allocate the time to tackle incremental tape
backups or nightly replication.
Security considerations
You rely on backups to protect you against disaster, but are the
backups themselves vulnerable to disaster? Whenever corporate data
resides outside the direct control of an IT department, it's
important to consider the implications of data security. Any remote
location should start with an evaluation of physical
security.
Tape storage or remote data center equipment should always be kept
under lock and key -- accessible only to a minimum number of
authorized personnel. Fire extinguishers and suppression systems
should use gasses that are friendly to electronic equipment and
digital media (water-based systems should be avoided). The
geographic location should also be free from flooding, earthquakes
and even potential terrorist targets. Feel free to inspect a remote
facility in advance. If the facility is managed by another company,
(such as Iron Mountain, take the time to discuss its security and
disaster plans, and define its liability for your vital data.
The data itself may need to be secured through encryption
techniques. As a rule, only personally identifiable information
must be secured, such as customer records with Social Security or
credit card numbers), though organizations that replicate data
often choose to encrypt all data in order to maintain security
across an open WAN (a.k.a. the Internet). Encryption can be handled
through backup software or implemented through dedicated encryption
appliances integrated into the network such as the DataFort product
family from Decru Inc. See the SearchStorage.com
Tech Roundup on encryption tools.
Testing and trainingEven the best DR plan is useless if it cannot be implemented, so
an important part of DR management is periodic testing and
training, bringing new IT personnel up to speed on the DR process
and verifying that recovery is achievable within the specified
recovery time objective (RTO). Recovery drills can be tricky
because they are disruptive -- a production network must be brought
offline and recovered from the very latest disaster backup.
Some organizations avoid lost production time and the risk of
unexpected problems by practicing with a test (lab) system. That
is, a scaled-down environment is backed up and then recovered using
the same means employed by the production network. While this
tactic does not verify the actual network, it does provide
important practice for IT personnel. Drills often include
discussion time for personnel to evaluate the plan and make any
recommendations to streamline or improve the DR process.
There are no solid guidelines that dictate how often a DR plan
should be tested, though once a year is probably the minimum
frequency. In addition to regularly scheduled testing, additional
testing can be accommodated as needed when personnel turnover
occurs or when changes to the DR plan are implemented. If you do
business with a DR recovery service provider, you may need to
schedule testing time in advance.
Updating the plan
Finally, DR plans are never static. Changes invariably occur with
storage resources, applications, IT personnel and even business
units or corporate practices. As changes take place, the DR plan
must be updated to accommodate those changes. For example, if 200
GB of additional storage capacity is added or a new storage array
is installed, that additional storage must be included in the DR
cycle. As another example, new privacy legislation may require
files to be encrypted where they may not have been encrypted in the
past.
Changes can also have secondary effects on the DR plan. Consider
the 200 GB of additional storage capacity added in the previous
example. Since more storage will take longer to backup, it may be
necessary to consider a different tape technology or increase the
WAN bandwidth to maintain acceptable RTOs. For larger
organizations, a system of change management may be needed to
report on any organizational changes precipitating a possible
adjustment to the DR plan.