Disaster recovery risk assessment and business impact analysis (BIA) are crucial steps in the development of a disaster recovery plan. But, before we look at them in detail, we need to locate disaster recovery risk assessment and business impact assessment in the overall planning process.
To do that, let us remind ourselves of the overall goals of disaster recovery planning, which are to provide strategies and procedures that can help return IT operations to an acceptable level of performance as quickly as possible following a disruptive event. The speed at which IT assets can be returned to normal or near-normal performance will impact how quickly the organisation can return to business as usual or an acceptable interim state of operations.
Having established our mission, and assuming we have management approval and funding for a disaster recovery initiative, we can establish a project plan.
A disaster recovery project has a fairly consistent structure, which makes it easy to organise and conduct plan development activity.
Adapted with permission from the BCM Lifecycle developed by the Business Continuity Institute.
As you can see from The IT Disaster Recovery Lifecycle illustration, the IT disaster recovery process has a standard process flow. In this, the BIA is typically conducted before risk assessment. The BIA identifies the most important business functions and the IT systems and assets that support them. Next, the risk assessment examines the internal and external threats and vulnerabilities that could negatively impact IT assets.
Following the BIA and risk assessment, the next steps are to define, build and test detailed disaster recovery plans that can be invoked in case disaster actually strikes the organisation’s critical IT assets. Such plans provide a step-by-step process for responding to a disruptive event with steps designed to provide an easy-to-use and repeatable process for recovering damaged IT assets to normal operation as quickly as possible.
Detailed response planning and the other key parts of disaster recovery planning, such as plan maintenance, are, however, outside the scope of this article so let us get back to looking at disaster recovery risk assessment and business impact assessment in detail.
Disaster recovery risk assessment
In the IT disaster recovery world, we typically focus on one or more of the following four risk scenarios, the loss of which would have a negative impact on the organisation’s ability to conduct business:
- Loss of access to premises
- Loss of data
- Loss of IT function
- Loss of skills
Risk assessments focus on the risks that can lead to these outcomes.
Peter Barnes, FBCI, managing director of London-based 2C Consulting said, “The key activities from an IT risk perspective are to consider the impact on the business if delivery of critical applications and services were to be denied as a result of a fire or server failure, for example, and to assess the risks that such a scenario might arise.”
A key aspect is to know what services run on which parts of the infrastructure, said Andrew Hiles, FBCI, managing director of Oxfordshire-based Kingswell International. “It sounds obvious, but one major insurance company had grown by acquisition and suddenly had several data centres,” he said. “They didn’t have a clue of the risks associated with their new acquisitions.”
One easy way to create a risk assessment is illustrated by this table.
Working with IT managers and members of your building facilities staff as well as risk management staff if you have them, you can identify the events that could potentially impact data centre operations.
Based on experience and available statistics, you can estimate the likelihood of specific events occurring on a scale of 0 to 1 (0.0 = will never occur, and 1.0 = will always occur). You can do the same with the impact of the event, using a 0 to 1 range (0.0 = no impact at all, and 1.0 = total loss of operations). The final column lists the product of likelihood x impact, and this becomes your risk factor. Those events with the highest risk factor are the ones your disaster recovery plan should primarily aim to address.
Another way to capture and display risk information is with a risk matrix. Entries in each part of the above table can be plotted on a four-quadrant matrix, as shown here.
A risk matrix, adapted with permission from "Principles and Practice of Business Continuity: Tools and Techniques," by Jim Burtles, copyright 2007 by Rothstein Associates; ISBN 1-931332-39-8
In terms of how we treat these risks, we can use the following categorisation:
- Prevent: High-probability/high-impact events (actively work to mitigate these)
- Accept: Low-probability/low-impact events (maintain vigilance)
- Contain: High-probability/low-impact events (minimize likelihood of occurrence)
- Plan: Low-probability/high-impact events (plan steps to take if this occurs)
Types of risks to consider
In the previous section we described a basic disaster recovery risk assessment. But, there are many types of risk, so what are some of the key ones that should be addressed from a UK IT perspective?
Supply chain disruptions present a key risk, said Susan Young, MBCI, a risk management professional with a London-based insurance company. “From an IT standpoint, reliance on outsourced providers not only presents a pure IT risk but also a supply chain risk. For example, in the Lloyd's insurance market in London, all businesses depend on a firm called Xchanging to provide premiums and claims processing. This is a huge dependency with very significant risks for the market as a whole.”
Hardware failure is another key danger to UK organisations. Kingswell International’s Andrew Hiles said, “A 2010 IBM report on UK email downtime showed hardware failure (server and SAN), connectivity loss and database corruption (in that order) as the main causes of downtime. A 2010 SunGard report said the most common cause of UK invocations was hardware, followed by power and communications.”
Water damage is a key risk to organisations in the UK, and sometimes the source can be so obvious it gets overlooked, said 2C’s Barnes. “Recently, I have noted servers in racks at floor level in basements,” he said. “While this area may be ‘dead space’ for offices, it is also where water accumulates when taps are left running in the toilets two floors above when everyone goes home on a Friday night.”
A BIA attempts to relate specific risks to their potential impact on things such as business operations, financial performance, reputation, employees and supply chains. The table below depicts the relationship between specific risks and business factors.
Risks can affect the entire company or just small parts of it. Operational and financial losses may be significant, and the impact of these events could affect the firm’s competitive position and reputation, for example.
BIAs are built on a series of questions that should be posed to key members of each operating unit in the company, including IT. Questions should address the following issues, as a minimum:
- Understanding how each business unit operates
- Identification of critical business unit processes that depend on IT
- Financial value of critical business processes (for example, revenues generated per hour)
- Dependencies on internal organisations
- Dependencies on external organisations
- Data requirements
- Minimum time needed to recover data to its previous state of use
- System requirements
- Minimum time needed to return to normal or near-normal operations following an incident
- Minimum number of staff needed to conduct business
- Minimum technology needed to conduct business
BIA outputs should present a clear picture of the actual impacts on the business, both in terms of potential problems and probable costs. The results of the BIA should help determine which areas require which levels of protection, the amount to which the business can tolerate disruptions and the minimum IT service levels needed by the business.
2C Consulting’s Barnes said a key aim of the BIA should be to define the maximum period of time the business can survive without IT. “First, measure the tolerances to an outage for critical applications or infrastructure services,” he said. “Next, examine available options that increase resilience and reduce the risk of service loss, such that you can provide service to the business in an acceptable timeframe.
Paul Kirvan is an independent consultant/IT auditor with more than 22 years of experience in business continuity, disaster recovery, security, enterprise risk management and telecomm/IT auditing.
This was first published in May 2011