Disaster recovery: Strategies for rapid restoration

Disaster recovery depends on how rapidly you restore mission-critical data, but many organisations are still hesitant prioritise rapid recovery strategies.

Ideally, a well-conceived disaster recovery plan will include strategies for rapidly restoring any mission-critical business applications. However, focusing on the recovery of key individual processes is still not a priority for many CIOs.

Getting back to business

When a steam pipe exploded in July, employees at Juma Technology were cut off from their offices -- and their laptops -- for six weeks. The health department ordered the building closed because of asbestos contamination and other health risks. Despite the enormous inconvenience, the company didn't miss a beat.

Using extension technology from Juniper Networks, Juma employees used non-company computers to create virtual private networks, tunnelling through a secure socket layer to access their applications and data. Remote Voice over Internet Protocol technology from Avaya, along with a wide area network optimisation appliance, enabled them to answer sales and service calls the day after the explosion.

Being able to rapidly restore business as usual was a result of deliberate disaster recovery planning, Chief Technology Officer Joe Fuccillo said. Juma helps companies integrate products from various telecommunications vendors, so it can't afford to be out of reach, despite the circumstances.

"Our people can get to that information from anywhere, because it's not locked away in a physical office. It's always replicated in multiple places," and available to users through Web browsers, handheld devices, cell phones or computers, Fuccillo said.

Surprisingly, many companies take a macro view of disaster recovery planning and fail to give their most important processes -- those that make money and serve customers -- top priority, experts say.

"We find that companies are willing to spend money on disaster recovery, although they tend to overlook the necessity of rapidly recovering their financial and operational data," said Rob Latimer, an analyst at GlassHouse Technologies.

Identify and classify

As a subset of a more comprehensive, organisation-wide approach to disaster planning, rapid recovery zeroes in on a limited number of business functions.

"The best practice is to have some form of business impact assessment, or BIA, in place before you begin putting together disaster recovery plans and building disaster recovery sites," said John Morency, a research director at Gartner

Armed with a BIA, a company can determine which business processes are the lifeblood of its existence, along with any supporting applications. Typically, those priorities are shaped not by IT departments but by line-of-business users, risk managers and other people to whom data must be continuously accessible.

Latimer explained, "You never want to over-design your operational recovery capabilities vs. your disaster recovery capabilities. For cost reasons more than anything else, it's important to narrow the amount of data that might need to be recovered at a remote location."

Set timelines and tolerances

Once they know their starting point, CIOs should ask two pivotal disaster recovery questions: "First, how long could the business operate without the data? Second, how much data loss could the organisation tolerate?" said Greg Schulz, founder and senior analyst at The StorageIO Group.

Morency said a recovery period between 24 and 72 hours is acceptable for many applications. Data that powers production networks, on the other hand, typically needs much quicker restoration.

"At least 60 to 70 percent of organisations say they have at least one application that requires recovery within 24 hours," Morency said, and the time frame is dwindling for many firms, to between four and eight hours.

The time crunch is especially critical for health care companies, including The Methodist Hospital System in Houston. Under an agreement with SunGard Data Systems, an enterprise software company, Methodist's backup tapes are shipped offsite in the event of disaster and used to re-create its most important production networks, including its revenue, laboratory and pharmacy systems.

Tropical Storm Allison provided the hospital with an opportunity to field test procedures in 2001. Heavy rains flooded a data centre housing the pharmacy computer system, triggering emergency procedures that enabled data to be recovered in about a day.

Although using tapes provides a respectable 24-hour recovery period, Methodist CIO Brian Schwinger said the hospital is evaluating the purchase of a tier-one storage area network (SAN) to replicate essential application data. A final decision on a SAN probably won't be made until 2009, when the company expects to finish construction of a new data centre.

"A high-quality SAN would enable us to have the data available within minutes after a disaster," Schwinger said.

Budget for technologies

If backup tapes aren't desirable, vendors like EMC, IBM and a host of smaller players provide asynchronous technology for archiving data. These applications batch updates to remote sites, then synchronise the data between primary and secondary storage databases. Although ideally suited for shorter recovery periods, Morency cautioned that the batch-oriented replication means some data will be lost.

Another option would be to use data mirroring, a real-time method of archiving with a smaller window for data loss. The downside: mirroring requires networks with costly sonic-class bandwidth, on the order of tens if not hundreds of megabits per second.

Combining remote mirroring with local snapshots or full clones of important data provides a hedge against unforeseen risks, Schulz said.

Regardless of the chosen recovery medium, Morency advises CIOs to consider buying intelligent software that "understands the properties of data" and performs integrity checks. This ensures that structural or formatting errors are not replicated during backup and archiving.

Read more on Disaster recovery