Disaster recovery training, awareness and testing

Strategies for disaster recovery training, awareness and testing are vital elements of the disaster recovery planning process that ensure plans you have developed can be executed.

Having developed detailed disaster recovery plans on the basis of risk and business impact assessments, the time has come to engage in disaster recovery training, awareness and testing. It is in these phases that the key element in the execution of your plans–the people in your organisation–will learn about their roles in recovering from unplanned IT outages and rehearse for such an event.

In this interview, SearchStorage.co.UK Bureau Chief Antony Adshead speaks with Paul Kirvan, board member with the Business Continuity Institute, about the key pitfalls in disaster recovery training, awareness and testing and how to avoid them, plus the secrets of disaster recovery testing.

Read the transcript or listen to the podcast on disaster recovery training, awareness and testing.

Play now:
Download for later:

Disaster recovery training, awareness and testing

  • Internet Explorer: Right Click > Save Target As
  • Firefox: Right Click > Save Link As

SearchStorage.co.UK: What are the key pitfalls to avoid in developing awareness, training and testing programs for DR?

Kirvan: The two most important criteria for success in developing and implementing IT disaster recovery awareness and training and testing programs are senior management support and funding. If you can’t get that level of approval, it’s going to be difficult, if not downright impossible, to complete any of these activities. Now, assuming you’ve got to the point where you are ready to start testing your technology disaster recovery plans, we can probably assume that you have already secured management support and funding.

It’s a good idea to keep senior management updated on your DR activities, and especially when you are approaching the testing phase of the program. If your testing plans call for tests beyond desktop walk-throughs and you plan to initiate system-level tests, it’s a good idea to advise senior management of these plans, especially if the systems you plan to test are critical enough that a disruption will arouse the ire of senior management. It may be worthwhile to brief senior management–not only in the IT department, but also higher in the organisation–of your plans so there will be no surprises and you can adjust your test scope, objectives and expectations if needed.

Additional pitfalls to avoid are 1) setting unreasonable (and unaffordable) goals for awareness and training and testing programs, 2) not engaging your HR organisation in all aspects of awareness and training, 3) not coordinating with all relevant ICT departments when planning and scheduling tests, 4) not coordinating with business unit leaders whose operations may be affected by a test, 5) not coordinating with vendors and service providers regarding your tests, 6) not leveraging vendor and service provider resources in your tests, 7) not having updated test scripts, 8) not preparing the testing team adequately (such as, who handles what part of the test, who documents the test results, who minds the time) to support the test, 9) not having a documented test plan, not just test scripts, and 10) scheduling the test when other ICT tests may have been scheduled.

SearchStorage.co.UK: How often do I need to actually test my concrete DR recovery plan, and what are the secrets of successful testing?

Kirvan: The principal secret to effective testing is to test often, or at least as often as possible, depending on the assets to be tested and their criticality to the organisation, as identified in the business impact analysis, or BIA. Most IT disaster recovery professionals advocate annual testing of critical systems, applications, network assets and facilities.

Some systems, such as specialised financial applications, may have unusually short recovery time objectives, such as under four hours or maybe even within one hour. These specialised applications and their associated servers and data storage devices may need to be exercised more frequently, such as twice a year or maybe even quarterly. Be sure that your testing reflects system and infrastructure criticality as defined in your business impact analysis. Systems that are a low criticality and priority may only need testing annually or possibly on 18-month or 24-month cycles.

Another secret is to document all aspects of your test, especially components like recovery, failover and failback scripts that are necessary to complete system recoveries.  Make sure that any other technical information you need for the recovery is documented and readily accessible.  Good documentation also provides an audit trail of activities so you will be in a good position to review your results with management.  It also makes it easier for members of your DR team to handle recoveries if key team members are not available. 

After completing a test, conduct a post-test review as soon as possible following the test, so that the results will still be fresh in the delegates’ minds. Document the test results, such as what worked and what didn’t work, lessons learned, and identify post-test activities, such as updating DR plans based on test outcomes. It may even be relevant to conduct a follow-on test to verify that the recommended changes from the first test actually work and that they eliminate the previous problems encountered.

Be sure to review test results with management, not only with senior IT management, but also with business unit leaders of the systems that were tested. You can be sure they will be just as interested in your test results--and what you plan to do if anything was identified that needs fixing--as your internal management.

In addition, keep an eye on tools that can help facilitate system tests, such as applications that facilitate system failovers and failbacks. Examples of these are Double-Take Software and solutions from the Neverfail Group.

In short, successful testing requires a lot of planning and preparation, a lot of coordination, senior management support, good documentation, and even a little bit of luck.

Read more on Disaster recovery