Formulating a detailed recovery plan is the main aim of the entire IT disaster recovery planning project. It is in these plans that you will set out the detailed steps needed to recover your IT systems to a state in which they can support the business after a disaster.
But before you can generate that detailed recovery plan, you’ll need to perform a risk assessment (RA) and/or business impact analysis (BIA) to identify the IT services that support the organisation’s critical business activities. Then, you’ll need to establish recovery time objectives (RTOs) and recovery point objectives (RPOs).
Once this work is out of the way, you’re ready to move on to developing disaster recovery strategies, followed by the actual plans. Here we’ll explain how to write a disaster recovery plan as well as how to develop disaster recovery strategies.
Regarding disaster recovery strategies, ISO/IEC 27031, the global standard for IT disaster recovery, states, “Strategies should define the approaches to implement the required resilience so that the principles of incident prevention, detection, response, recovery and restoration are put in place.” Strategies define what you plan to do when responding to an incident, while plans describe how you will do it.
Once you have identified your critical systems, RTOs, RPOs, etc, create a table, as shown below, to help you formulate the disaster recovery strategies you will use to protect them.
Table 1: Determining strategies
You’ll want to consider issues such as budgets, management’s position with regard to risks, the availability of resources, costs versus benefits, human constraints, technological constraints and regulatory obligations.
Let’s examine some additional factors in strategy definition.
People. This involves availability of staff/contractors, training needs of staff/contractors, duplication of critical skills so there can be a primary and at least one backup person, available documentation to be used by staff, and follow-up to ensure staff and contractor retention of knowledge.
Physical facilities. Areas to look at are availability of alternate work areas within the same site, at a different company location, at a third-party-provided location, at employees’ homes or at a transportable work facility. Then consider site security, staff access procedures, ID badges and the location of the alternate space relative to the primary site.
Technology. You’ll need to consider access to equipment space that is properly configured for IT systems, with raised floors, for example; suitable heating, ventilation and air conditioning (HVAC) for IT systems; sufficient primary electrical power; suitable voice and data infrastructure; the distance of the alternate technology area from the primary site; provision for staffing at an alternate technology site; availability of failover (to a backup system) and failback (return to normal operations) technologies to facilitate recovery; support for legacy systems; and physical and information security capabilities at the alternate site.
Data. Areas to look at include timely backup of critical data to a secure storage area in accordance with RTO/RPO requirements, method(s) of data storage (disk, tape, optical, etc), connectivity and bandwidth requirements to ensure all critical data can be backed up in accordance with RTO/RPO time scales, data protection capabilities at the alternate storage site, and availability of technical support from qualified third-party service providers.
Suppliers. You’ll need to identify and contract with primary and alternate suppliers for all critical systems and processes, and even the sourcing of people. Key areas where alternate suppliers will be important include hardware (such as servers, racks, etc), power (such as batteries, universal power supplies, power protection, etc), networks (voice and data network services), repair and replacement of components, and multiple delivery firms (FedEx, UPS, etc).
Policies and procedures. Define policies for IT disaster recovery and have them approved by senior management. Then define step-by-step procedures to, for example, initiate data backup to secure alternate locations, relocate operations to an alternate space, recover systems and data at the alternate sites, and resume operations at either the original site or at a new location.
Finally, be sure to obtain management sign-off for your strategies. Be prepared to demonstrate that your strategies align with the organisation’s business goals and business continuity strategies.
Once your disaster recovery strategies have been developed, you’re ready to translate them into disaster recovery plans. Let’s take Table 1 and recast it into Table 2, seen below. Here we can see the critical system and associated threat, the response strategy and (new) response action steps, as well as the recovery strategy and (new) recovery action steps. This approach can help you quickly drill down and define high-level action steps.
Table 2: Using strategies to create plan
From Table 2 you can expand the high-level steps into more detailed step-by-step procedures, as you deem necessary. Be sure they are linked in the proper sequence.
Developing DR plans
DR plans provide a step-by-step process for responding to a disruptive event. Procedures should ensure an easy-to-use and repeatable process for recovering damaged IT assets and returning them to normal operation as quickly as possible. If staff relocation to a third-party hot site or other alternate space is necessary, procedures must be developed for those activities.
When developing your IT DR plans, be sure to review the global standards ISO/IEC 24762 for disaster recovery and ISO/IEC 27035 (formerly ISO 18044) for incident response activities.
In addition to using the strategies previously developed, IT disaster recovery plans should form part of an incident response process that addresses the initial stages of the incident and the steps to be taken. This process can be seen as a timeline, such as in Figure 2, in which incident response actions precede disaster recovery actions.
Figure 2: Disaster timeline
Note: We have included emergency management in Figure 2, as it represents activities that may be needed to address situations where humans are injured or situations such as fires that must be addressed by local fire brigades and other first responders.
The following section details the elements in a DR plan in the sequence defined by ISO 27031 and ISO 24762.
Important: Best-in-class DR plans should begin with a few pages that summarise key action steps (such as where to assemble employees if forced to evacuate the building) and lists of key contacts and their contact information for ease of authorising and launching the plan.
1. Introduction. Following the initial emergency pages, DR plans have an introduction that includes the purpose and scope of the plan. This section should specify who has approved the plan, who is authorised to activate it and a list of linkages to other relevant plans and documents.
2. Roles and responsibilities. The next section should define roles and responsibilities of DR recovery team members, their contact details, spending limits (for example, if equipment has to be purchased) and the limits of their authority in a disaster situation.
3. Incident response. During the incident response process, we typically become aware of an out-of-normal situation (such as being alerted by various system-level alarms), quickly assess the situation (and any damage) to make an early determination of its severity, attempt to contain the incident and bring it under control, and notify management and other key stakeholders.
4. Plan activation. Based on the findings from incident response activities, the next step is to determine if disaster recovery plans should be launched, and which ones in particular should be invoked. If DR plans are to be invoked, incident response activities can be scaled back or terminated, depending on the incident, allowing for launch of the DR plans. This section defines the criteria for launching the plan, what data is needed and who makes the determination. Included within this part of the plan should be assembly areas for staff (primary and alternates), procedures for notifying and activating DR team members, and procedures for standing down the plan if management determines the DR plan response is not needed.
5. Document history. A section on plan document dates and revisions is essential, and should include dates of revisions, what was revised and who approved the revisions. This can be located at the front of the plan document.
6. Procedures. Once the plan has been launched, DR teams take the materials assigned to them and proceed with response and recovery activities as specified in the plans. The more detailed the plan is, the more likely the affected IT asset will be recovered and returned to normal operation. Technology DR plans can be enhanced with relevant recovery information and procedures obtained from system vendors. Check with your vendors while developing your DR plans to see what they have in terms of emergency recovery documentation.
7. Appendixes. Located at the end of the plan, these can include systems inventories, application inventories, network asset inventories, contracts and service-level agreements, supplier contact data, and any additional documentation that will facilitate recovery.
Once your DR plans have been completed, they are ready to be exercised. This process will determine whether they will recover and restore IT assets as planned.
In parallel to these activities are three additional ones: creating employee awareness, training and records management. These are essential in that they ensure employees are fully aware of DR plans and their responsibilities in a disaster, and DR team members have been trained in their roles and responsibilities as defined in the plans. And since DR planning generates a significant amount of documentation, records management (and change management) activities should also be initiated. If your organisation already has records management and change management programmes, use them in your DR planning.
This was first published in July 2011