Imagine a situation in which a flash flood causes a total power outage at a data center, or one in which an act of vandalism interrupts power to a section of the data center. The resultant non-availability of this data center could have a negative impact on the business, more so when critical IT services are delivered from this data center. Disaster recovery (DR) is thus of prime importance, and it is risk assessment methodology that forms the basis of a successful DR plan.
Business impact analysis
The first step in any risk assessment methodology is the formation of a steering committee consisting of top management representatives, heads of operations, and facilities teams, and business heads. Along with consultants, the role of this committee is to:
- Determine legal, contractual and regulatory obligations associated with business processes critical to business continuity.
- Identify and prioritize key products and services and critical business processes supporting their delivery.
- Determine people, process and technology dependencies of these processes.
- Determine activities associated with the delivery of products and services, and establish maximum durations for which activities could be down before tangibly damaging the business.
- Determine all IT services that enable business processes.
Risk assessment methods
Once all critical business processes are ascertained, risk assessment methodology is used to identify risks to the business. A risk is rated as low, medium or high, depending on its propensity to result in an incident of negative impact. Risks can be broadly classified into three types:
Natural risks: These include floods, earthquakes and storms.
Man-made risks: These include terrorist attacks, riots, strikes, sabotage, fire, arson and cyber warfare. Such risks also arise from human error leading to process failures, or disgruntled employees or corporate spies damaging or stealing organizational assets.
In order to rate natural or man-made risks, historical analysis of these risks is essential to determine the likely occurrence of a particular incident. This involves determining the number of instances of that incident over a period of five to ten years in a given location. For instance, the deluge and floods of 26 July 2005 in the city of Mumbai were unprecedented. While any risk assessment methodology executed 10 years ago would have rated floods as a low risk for Mumbai, today it would be rated as a medium or a high risk.
Man-made risks such as terrorist attacks can be appropriately rated by determining past instances of such attacks in the given location, or studying the prevalent socio-political situation. Risk assessment for acts such as arson, strikes, cyber attacks or employee behavior is based on occurrence of previous incidents, as well as the organization’s current ethical, regulatory, and access and authorization policies under implementation.
The relevant member from the steering committee needs to talk to various government, security, law and order and meteorological agencies to gain detailed information for applying risk assessment methods on various risks as appropriate to the business location in question. The recently-launched National Disaster Management Authority (NDMA) aims to provide comprehensive information and statistics on various disasters and risks. Currently, information is provided on request at the NDMA website.
Technology risks: When an IT component or device fails, IT services could become unavailable, causing business processes to break down, with potential data loss as well. Risk assessment methods to determine an IT risk as low, medium or high would vary for each organization. An IT component failure would be rated as a high risk if the incident causes a critical business process to fail, but as a low risk, if non-critical business processes are affected. For instance, a server failure is rated as a high risk for a manufacturing unit running an ERP system, while the failure of a few desktop terminals would be a low risk. However, for a call center, the failure of desktop terminals is as much of a high risk as a server failure would be.
There are no standard guidelines for risk assessment methodology that can be applied across the board. These need to be customized depending upon individual business needs. The overall objective of risk assessment methods is to identify single points of failure, so that resilience could be built in to safeguard the organization’s business from high-risk incidents.
About the authors: R Vaidhyanathan is vice president and head – crisis management & BCM practice at Continuity and Resilience (CORE). S Seshadri is practice head – DR and ITSM at CORE.
(As told to Harshal Kallyanpur)
Identify vulnerabilities and stop application hacks
Download our free data center DR template