

There is no one-size-fits-all business continuity
strategy, so think of disaster recovery scenarios as modules that
can be invoked depending on the situation, says Josh
Krischer
There is no one-size-fits-all when it comes to developing
business continuity strategies. Using someone else's requirements,
which might turn out to be based on limitations or regulations that
your company does not have, could spell disaster of another
type.
Think of business continuity and recovery scenarios as modules
that fit into a broader business continuity plan. When an incident
occurs, it is mapped to the appropriate business continuity
scenario, which then dictates the appropriate recovery plan modules
to be invoked.
Modules can be reused for various business continuity scenarios.
For example, certain types of disaster will involve making contact
with external authorities, while others will not. Some types of
disaster will require the involvement of a company's PR department,
others will not.
Many companies think the end game for business continuity is to
recover the technology infrastructure, such as network, telecoms,
applications and desktops. Therefore they do a fine job in disaster
recovery but when and if the time comes to execute the disaster
recovery plan and use the recovery site for production processing,
it may not be possible for business to be conducted. Small,
seemingly unimportant things need to be taken into consideration by
both the business and IT.
Business impact analysis is a critical step as it identifies
what and how much the company has at risk, as well as which
business processes are most critical, thereby prioritising risk
management and recovery investment.
The business continuity team, which has to include the business
process owners, must translate the business requirements into an
overall business continuity plan. Three of the most important
deliverables from a business impact analysis are:
- Recovery time objective (RTO): the length of time between when
a disaster occurs and when the business process must be back in
production mode
- Recovery point objective (RPO): the point in the business
process to which data must be recovered after a disaster occurs.
For example, the start of the business day, the last back-up or the
last transaction that was processed
- Cost of downtime: the business should calculate the potential
losses incurred, both as the result of a disaster and in recreating
lost data.
These considerations determine the technologies and methods used
to support the disaster recovery plan.
Keeping data losses to a minimum is critical for some
applications. But a more important issue is assuring data
consistency and integrity at the recovery site.
If the data is not consistent at the recovery site, a
time-consuming back-up is usually required, which may take
days.
Companies need to understand fully how their chosen replication
technology works, what its limitations are, and how it will react
in various disaster scenarios, such as loss of network, physical
site disaster, component failure and application failure. Only then
can they put in place a strategy to assure data recovery with
integrity, and still meet their RPOs.
Also, hunting down conflicting data and reconciling the status
of key information can mean a much longer recovery time. Many
companies mistakenly believe replication technology suppliers who
say there will always be data consistency in a disaster.
There is no ideal distance between primary and disaster recovery
datacentres. Rather, the best location is the one that minimises
the risks at an acceptable cost and meets any required industry
regulations.
Increasing the distance between the primary and secondary sites
will mean higher telecoms costs and the deployment of appropriate
techniques. It may also reduce performance and increase the chances
of disruption. Users should invest in infrastructure to ensure
availability of resources that are usually beyond their
control.
In most cases, regardless of the distance between the sites,
each datacentre should have a separate main power supply (different
providers or at least different transformers) and separate telecoms
paths.
It would be even better if each datacentre had redundant power
generators and an uninterruptible power supply. If both sites are
connected by fibre optic cables, redundancy should be provided by
using two separate routes.
It is important to maintain storage controller replication by
keeping two copies at the recovery site: a main copy (target of the
replication) and a point-in-time copy, or snapshot. If the remote
copy operation is suspended, due to a transmission problem, for
example, data in the primary site will be modified but not
transferred to the secondary site. If the connection is
re-established, re-synchronisation will send the modification but
not in the order in which the writes were issued.
If a disaster strikes during the resynchronisation the data at
the secondary site will be inconsistent. To avoid this situation
before the resynchronisation a split between the target (secondary
disc) and the point-in-time copy should be performed. Therefore, if
a disaster strikes during resynchronisation, data on the secondary
disc may not be consistent, but the point-in-time copy will contain
the last consistent image. The local copy is re- established after
resynchronisation is complete.
Another best practice is to perform the recovery process from
the copy and not the original secondary disc because the data can
be damaged during this process. If the recovery is done from the
local point-in-time copy, it will not damage the source data and a
new local copy can be made at any time.
Businesses should ensure that only the bandwidth in synchronous
remote copy exceeds peak data transfer requirements. For
asynchronous remote copy, the bandwidth for average activity is
sufficient.
Josh Krischer is research vice-president at analyst firm
Gartner
Developing a business continuity plan
Business continuity management and disaster recovery planning is
hard work because it means addressing every aspect of business
operations in the planning, development and testing phases
Start strongly - know what is required, and what is not, by
conducting a business impact analysis
Apply an integrated business and IT approach for recovery plan
development, management and testing
Reduce the maintenance of a continuity plan by using modular
scenarios for disaster and recovery
Assure the integrity of data at the secondary site through
proper planning and testing, and by keeping point-in-time
copies.
Source: Gartner