All that changed with the arrival into the mainstream of server virtualisation, which eliminated the requirement that the secondary site be a hardware carbon copy of the primary one. A key benefit of virtualisation is its ability to save on disaster recovery infrastructure and that brings DR within the range of companies that were previously unable to afford it.
DR requires restoration of an entire system quickly and easily, and virtualisation scores big here. Because a VM is independent of hardware, it's easy to move or copy a server from one physical server to another server that you can recover from in case of disastrous events. Meanwhile, the falling cost of bandwidth means you can use remote offices as DR sites.
Even applications that do not run on virtual servers can be virtualised when replicated. If a mirrored virtual server runs at only at 80 percent of the performance of the physical production machine, that's still a lot better than complete service failure.
To further exploit the DR advantages of server virtualisation, hypervisor vendors are tailoring their products to DR requirements. VMware's Site Recovery Manager, for example, offers features that support DR, such as planning, discovery and testing, and automated failover. Microsoft's Hyper-V has fewer features aimed specifically at DR but can be successfully combined with products such as Vision Solutions’ Double-Take, which replicates servers and keeps them in sync. (VMware’s vSphere also supports such replication products.)
But there are wrinkles. You need to remember, for example, that in a Windows setup you'll need to restore the domain controllers first or none of the services dependent on Active Directory, such as Microsoft Exchange, will work.
Designing and implementing a DR plan is a complex task, with variables that include technology, corporate policies and available skill sets. Businesses are also starting to take advantage of cloud-based DR services, which is usually underpinned by server virtualisation too.
To investigate how UK companies are using server virtualisation, we talked with two London-based IT organisations in the financial services sector, one of which is using a managed service provider for DR services and the other an in-house replication solution involving custom PowerShell scripts.
Hampden Capital outsources DR
Hampden Capital delivers financial services to the Lloyds-backed insurance market. It is regulated by the Financial Services Authority (FSA) and follows the FSA's best practices.
"If the best practices are to do ‘xyz,’ that's what we do," said Andrew Hough, IT manager. "This means we need to know how to be up and running within 30 to 60 minutes of a disaster."
Hampden Capital discussed that requirement with Frontier Technology, its incumbent supplier for other IT services, and then subscribed to Frontier’s managed continuity solution. Frontier now mirrors all Hampden's servers, both physical and virtual, to virtual machines on its premises.
"We told them which servers we wanted to cover, their sizes and how quickly we wanted them back working," said Hough. "They manage it and test it once or twice a year, and we take part so we know it happened."
The company employs about 150 people and houses its systems in its offices in the City of London. It has 10 physical servers--eight are in London, the rest are in an office in Buckinghamshire--and runs more than 20 virtual machines. Some servers remain unvirtualised, such as Exchange, file and print, the main SQL server, and Linux, but several Citrix servers, SUSE Linux and miscellaneous application servers are virtualised.
When it came to DR, said Hough, "We wanted to cover Microsoft Exchange, SQL Server, the file and print servers, document management servers, and a Linux box."
Before it bought into Frontier Technology's services, Hampden had a limited DR plan. "In the event of a disaster, we would have had to rebuild from scratch using servers held off-site," Hough said.
The impetus for a fresh look at how Hampden managed its DR came from the board, which mandated a "substantially quicker" DR process than had previously been in place. It previously took eight hours to restore email, and two days for the rest of the systems. The board rejected that as insufficient, prompting Hough to look for a managed service.
When it came to selecting a DR provider, one large telecoms provider was rejected because of fears about the disparity between the sizes of the two companies. "We weren't confident about the level of touchy-feely support they could offer," said Hough. The other reject was a City-based IT provider that proffered a shopping list from which "they could probably do this or that," Hough said. "We preferred that someone else do the trailblazing."
The replication uses a VPN between Hampden’s and Frontier's offices over a 50 Mbps leased line, and the DR plan is regularly tested. "Testing of the DR plan works on the basis that Frontier breaks the link on the synchronisation," Hough said. The company has successfully conducted three tests in the last 18 months.
"The service sends text messages to us and Frontier in the event of any possible problem, and if we decide to switch over to the DR service, staff can then connect in via Citrix. Frontier provides an alternative public Web address to point to, so we can use Citrix services that we are all used to and it looks just the same," Hough said.
“The text messages have been received very rarely and only when the power has gone down. We're as confident as we can be that it works as it's supposed to."
CMA Vision’s custom VM replication
CMA Vision is a financial institution whose market analysis creates large volumes of highly sensitive data. The company is heavily dependent on its databases, which process the data in real time and are backed up twice daily.
Its IT infrastructure includes nine VMware-based host servers that support more than 100 virtual machines connected to a Compellent SAN, all installed in the company's data centre in London's Docklands. Its Cannon Street DR and backup location is a short distance away.
Ryan Sclanders, IT infrastructure manager, is in the process of revamping the company's DR plan, which is a mirrored version of the production site.
"I don't store any data on the VMs, so it's easy to duplicate a VM and the configuration," Sclanders said. "All the data is in the databases on the SAN so I use SAN replication."
An advantage of SAN replication is that the SAN bears the load instead of the host servers. "If the link goes down between the two sites, Compellent Enterprise Manager will alert me and create recovery deltas for when the link comes back up," Sclanders said.
Sclanders didn't buy software to effect the replication components of his DR plan but instead wrote them himself in Windows PowerShell. "The scripts automatically create a copy in the DR environment with a set of matching IP addresses," he said. Twice a day the scripted operation creates a writable view of a SAN volume snapshot using the Compellent view feature at the DR site. It then attaches the view to the DR site's database server and makes it active.
Meanwhile, whenever a VM is created at the primary site, a replica is created at the secondary site. This is completely automated by one of Sclanders' custom-created PowerShell scripts.
"This essentially means I have a warm site," said Sclanders. "In the event of a failover, it's not immediate--the shortest time to recovery is four hours, but I can test it at any stage by taking a view and restoring the services."
Because he's in the middle of developing the DR scheme, Sclanders said that the system has yet to undergo a complete live test. "However, I have tested the application server and the binary website and backup facility, and it all works," he said. "Once completed we will simulate a complete DR failover."
During project evaluation, CMA Vision examined VMware's Site Recovery Manager, Vizioncore’s vReplicator (now owned by Quest Software) and HP StorageWorks Storage Mirroring software. SRM was rejected, Sclanders said, because “I didn't need to replicate the virtual machine volumes. There is no data stored on the virtual machines, and that allowed me to create a duplicate with matching IP addresses on the DR side, rather than a replicated copy.”
The HP and Vizioncore products were rejected because they are host-based and would have added to system overheads, they would create huge recovery log files in case of network outage, and there were difficulties with snapshots that would have affected replication.
"They didn't tick all the boxes. I was hoping to be able to have a live volume where I could replicate to a DR server that would be active. That didn't work," Sclanders said. "The closest I could come was a view of the volume that I could make active and have it as an attached volume."
Sclanders added that the system will be fully operational within a few months and that the motivation for scripting it himself was not just financial. "I just enjoy scripting," he said.
Manek Dubash is a UK-based journalist with more than 25 years of experience.