Getting it right requires you to think through the data protection aspects of virtualisation and factor them in...
when planning a project. That's the lesson learned from the mistakes of some early adopters, who often assumed they could simply transplant their existing backup processes.
"Companies still trying to manage multiple VMware backup agents on a single ESX machine are the ones having problems," notes Blair Innes, a storage industry veteran who's now a senior sales consultant at Selway Moore Limited, a storage and data centre specialist based in Reading, Berkshire.
That's because traditional backup methods expect a one-to-one relationship between server hardware and software, says Innes. But if one machine hosts many virtual machines (VMs), that's a lot of backup agents to buy, install and manage.
In addition, traditional methods assume users know that the servers are there and need protecting. But that won't necessarily be the case with VMs, warns Tony Lock, a programme director at Freeform Dynamics Ltd., an analyst group in New Milton, Hampshire.
Beware server sprawl
Lock says virtualisation makes it so easy to create new systems that the result is often "virtual server sprawl." As with the arrival of the PC 25 years ago, organisations can find themselves with critical data residing on VMs that are unprotected because IT and storage administrators don't know they exist.
If you have a corporate standard for backup software, adds Lock, there's also the question of whether it will actually work with VMs. And if it does, are you licensed to use it that way?
"Most backup players license differently," he says. "Some may need a license per machine, virtual or physical. Then there are extra modules. Are they per VM or per backup server? A lot of service providers aren't always familiar with the variety of licensing scenarios available."
What makes a virtual machine easier to protect than a physical one is the level of abstraction provided by the hypervisor. That's the software layer -- such as VMware ESX or Microsoft Hyper-V -- that presents the VM as if it's a physical server.
This abstraction means a VM is just a handful of large files on its host machine, which means you can quickly copy it to another server acting as a hot spare. A notable advantage over traditional high-availability mirroring is that the replica doesn't have to run on the same hardware, making mirroring cheaper to achieve.
For instance, London-based financial services company Matrix Group Limited uses Double-Take Software's Double-Take for VMware Infrastructure to protect its VMware systems by mirroring the virtual servers to machines at its disaster recovery (DR) site which is hosted by IT services provider Oncore IT.
"It replicates in real-time and means we can be up and running in 10 minutes if a failure occurs," says group IT manager Laurence Duff. "The project is very sophisticated, but what it does is quite simple."
Restoring from snapshots
This ability to rapidly copy a live VM can also be used for backups, with tools such as Symantec Corp.'s Veritas NetBackup, VMware's Consolidated Backup (VCB), Veeam Software's Backup and Vizioncore Inc.'s vRanger Pro able to snapshot a server so it can be backed up whole. In effect, it gives a new and simpler way to do a bare-metal restore.
Associated Newspapers Limited uses vRanger and deduplicating disk-based backup arrays from Data Domain Inc. to protect its VMware servers. "In terms of our ability to restore virtual machines, we're now in a far better position to act," says Steve Bruck, infrastructure architect at the publishing company.
"The fact that we're no longer trawling through tape libraries, combined with our ability to restore at disk-to-disk speed rather than tape-to-disk, means our recovery times are a great deal faster," he adds. "We might be looking at half an hour to restore a server rather than the half a day that our old solution would have taken."
When building data protection into a virtualisation strategy, you should expect an increase in load on hardware and network resources. Virtualised servers can be much more productive than physical servers, running up to 30 VMs or more per machine. But that in turn means 30 servers sharing a single network connection and a single boot disk.
It also means less spare processor capacity for occasional tasks, such as running backups, says McCreath.
"Virtualisation is an exceptionally hardware-intensive way of doing things, albeit a very efficient one," he says. "The whole point is to get up to 70% or 80% utilization. But that might not leave you [with] enough room for the extra load of running backups."
The standard way to deal with this is to move the backup and recovery load off the application server and onto the storage tier. You can then use replication tools to snapshot the virtual machine in the background and move the copy to a dedicated backup server.
Most of the major backup software suppliers and enterprise storage vendors have invested time and money in understanding server virtualisation; as a result, they all now have tools to protect virtual servers, says McCreath, although he adds that they could all do more.
"It's come a long way from writing Perl scripts to cope with VMware, but there's still a long way to go," he says.
The snapshot as a backup volume
A key development is that not all backup tools now require you to mount a VM image to recover a file. For example, Symantec and Veeam let you pull individual files out of a snapshot. That avoids the need to keep a spare machine on hand to restore the VM image, says Innes.
He adds that it's also important to store the data separately from the application server, not least because the latter offers much more opportunity for deduplication. In some cases, such as webserver clusters, it also means application servers can simply be cloned or deleted as needed.
However, Innes warns that all this will add to the load on the data centre's storage back end. As a result, he predicts we'll see renewed interest in storage networking.
"As we move to server virtualisation, you need more performance, so you need a SAN," he says. "I even see people expanding their Fibre Channel [FC] SANs and buying new ones," he adds, "because if you look at the cost of running 10 Gig iSCSI versus 8 Gig Fibre Channel, the latter might actually be simpler."
Server virtualisation also means more data to back up. This is partly because it tends to result in a big jump in the server population -- virtual server sprawl -- which means there will be more images to manage and store.
This is where deduplication technology, such as that offered by CommVault, Data Domain or EMC Corp.'s Avamar, can help. Deduplication understands that it's protecting multiple VMs and avoids storing duplicated data by looking for repeated patterns in the files and storing only the first iteration.
This reduces the amount of data that must be backed up, with users such as the Associated Newspapers' Bruck reporting compression ratios as high as 50:1 in some cases.
Deduplication can also reduce the volume of data that must be transferred over the WAN to a secondary data centre for DR. However, you'll need to ensure that deduplication is done at the source, and not -- as some schemes do it -- once the data is at rest at the target.
A second reason why server virtualisation needs more backup capacity is overallocation. For example, VMware suggests administrators allocate each VM twice the amount of disk space they actually plan to write to. But most of that space will go unused and, with many VMs involved, the waste will soon mount up.
There are several possible solutions. An obvious one is thin provisioning, which saves storage space by only allocating physical capacity as it's used. But there are also standalone tools such as Vizioncore's vOptimizer Pro, which can reclaim space by automatically inspecting and resizing a VM's file system.
One company using thin provisioning for virtual servers is Volkswagen Financial Services (UK) in Milton Keynes, where senior network specialist Mike Duxbury says it has saved his team from the need to be cautious when allocating storage to a new VM. He adds that the company even runs its DataCore Software Corp.-based SAN servers as virtual machines.