Strategy Clinic: How can you test disaster recovery plans?
- Posted:
- 15:56 09 Jun 2004
- Topics:
- Business Continuity
Our disaster recovery plan works in theory. I would like to be sure it works in practice. We have tested parts of it but the tests bear little resemblance to how a real crisis would affect us. Can the panel advise where the greatest vulnerabilities lie in continuity planning and how to test the plan thoroughly?
Tell the management and then switch it all
off
There is only one way to test a disaster recovery plan: you turn
all the systems off and then turn them on again. Obviously, choose
the time of lowest demand but with the maximum time possible to
unscramble if the systems do not hum quietly back to life. Back
everything up and try to have a contingency plan if your disaster
recovery does not do just that.
Inform top management in advance and get them to buy into the risk
of such a trial against what would happen if it occurred
"naturally", unscheduled and unannounced.
You will age five months in five hours while you are doing this,
and age five years if those little lights do not flicker back
on.
Robin Laidlaw, President, CW500 Club
Simulation can cause more disruption than real
life
A real disaster will not be the same as any of the tests you have
done. How much does that matter? How realistic does testing need to
be?
Disaster recovery is part of the business' response to risk. It is
an investment - often a large one - the purpose of which is to
mitigate the risks. A risk analysis should be done first to
establish the likelihood of an event taking place, the cost to the
business of such an event and the amount and type of mitigation
that is called for. All of these factors are variable, and many
permutations are possible.
The cost of a disaster can then be compared with the mitigation
cost. The relevance of this to your question is that the cost of
even more testing may be greater than the additional protection it
gives. Taking this to its logical conclusion, one could damage the
business more by extensive, intrusive, simulated disaster testing,
which causes more disruption than would be suffered in a real
disaster.
Such a calculation may lead you to conclude that your disaster
recovery plans, rather than being inadequate, are actually
excessive, not in absolute terms but relative to what your business
needs; what mitigation is actually achievable; and how much it
costs.
Most businesses do not approach disaster recovery in this way.
Generally this results in an inadequate level of disaster recovery
preparation, rather different from the picture I have drawn
here.
Roger Marshall, BCS Elite Group
Evaluate risk assessment and business impact
Key vulnerabilities in business continuity planning are the
relevance of the plans to the organisation, the positioning of
continuity and, as the question highlights, appropriate
testing.
Many organisations believe they have covered business continuity
because they have a published plan, but often these are out of date
or do not have a nominated owner or method of review. Regular
formal risk assessment and business-impact analysis should drive
the critical elements of the plan and highlight the main areas that
need testing. Where there are organisational changes, the impact
should be revised and the plans updated - this should include
changes in personnel.
Business continuity is often thought of as an IT issue but it
should be considered from a business perspective. IT plays an
important role in the provision of systems and services but plans
must consider issues such as access to buildings, power supplies
and external threats.
The process should involve individuals from all key business areas
and highlight where interrelationships exist. A key distinction
should be made between disaster recovery and longer-term business
continuity, particularly with regard to responsibilities.
Testing needs to consider wider issues such as time of day and
third-party reliance. Many tests are performed outside office hours
to avoid disruption and may not truly reflect the working
environment. Equally, if the plans are focused solely on the
business and not other organisations in the supply chain, this
could have a major impact on long-term recovery. Clearly a full
test can be hugely disruptive, so consider testing components in
isolation, but make sure it all fits together.
Richard Woods, NCC Group
Test everything, especially with end-users involved
You don't say which parts you have tested or how, so it is
difficult to be specific. However, I would give the following
general advice:
- Work on the basis that untested plans don't work, so not testing it because it seems difficult or costly should not be an option
- Start with a rigorous desktop review and consider all the things that might go wrong, involving a variety of people from your business
- The closer a test is to simulating a real-life incident the better, although this can be both expensive and unpopular. The more rehearsals there are, the greater the likelihood that people will respond correctly when the real thing happens
- Be sure to have independent reviewers/observers with business continuity or disaster recovery experience to help with all phases of testing and design of improvements. Users may try to get away with small cheats or ask for hints in tests. It is important that no leeway is given - you really need to know what could go wrong.
Apart from lack of rigorous testing, other challenges and potentially therefore the greatest vulnerabilities are:
- Lack of visible business sponsorship and ownership
- Failure to agree cross-business on criticality of systems
- Inadequate validation of required and feasible recovery timescales between IT and business units
- Undue reliance on a plan that will not do what people assume it
will because it has not been validated or tested.
John Butters, Partner, Ernst & Young
Take the scenario approach and carry out a dry
run
IT disaster recovery and business continuity planning are very
important topics at this time. Do you have responsibility for both
the IT and the business aspects or just the IT elements? The tests
will be more realistic if they combine elements of IT and business
recovery.
Of course, one way to fully test the plan is to engineer a recovery
situation of which people have no advance knowledge. However, this
will affect the operational business process and be a high risk if
your plans are not as effective as you hope.
Alternatively, a scenario-based approach to testing your plan would
give you more confidence. Identify some scenarios that are most
likely and some that would give you the most pain. You can now
dry-run the probable effects and responses of these scenarios on
selected areas of your business. Make sure you use someone
independent of the original team to provide the challenges on
whether your organisation is ready or not.
The results should identify potential weaknesses, which may include
the plan, the roles and responsibilities, documentation and
technology.
This is a good opportunity to engage with your business users and
your IT suppliers. It will give a broader perspective and
ultimately increase your confidence in the recovery plan.
Sharm Manwani, Henley Management College