As server virtualization assumes a greater role in the enterprise, administrators face a proliferation of virtual machines residing on the same physical server. Each virtual machine uses a portion of the physical machine's processing, memory and I/O resources. Ideally, server virtualization provides a means of increasing hardware utilization. But as more "logical" servers are consolidated into fewer "physical" computer systems, it's important to protect each virtual machine's data against failure or loss. Virtual server backups are the key to providing this protection. This article examines how virtual server backup can be achieved using a mix of traditional backup techniques and specialized virtualization tools, highlights important deployment issues and looks at several real-world users.
What is virtual server backup?
A virtual machine is a complete logical environment existing as a separate entity on a physical server. Each virtual machine is treated and perceived as if it is physical. In fact, a user cannot tell the difference between a real and virtual machine. A data center may host thousands of virtual machines running on only a fraction of that much hardware, and this presents a serious problem for storage or backup administrators. Data loss on a virtual server can be just as catastrophic as data loss on a physical server, so every virtual server must be backed up as part of a company's backup regimen.
Virtual server backups can be accomplished using a traditional approach with conventional backup software. The backup software is simply installed and configured on each virtual machine, and backups will run normally to any conventional backup target, including tape drives, virtual tape libraries (VTL) or disk storage. "That's probably the most popular way that people do it today because it's familiar," says Lauren Whitehouse, analyst with the Enterprise Strategy Group (ESG). "It ensures a consistent backup; it will give you the granular recovery that you're looking for, and it's application-specific."
However, applying traditional backup tactics to virtual server backups does have drawbacks. The most significant problem is resource contention. Backups demand significant processing power, and the added resources needed to execute a backup may compromise the performance of that virtual machine and all virtual machines running on the system. "Don't go for 100% utilization," says Greg Schulz, founder and senior analyst at the Storage IO Group. Leave some server resources unused to accommodate backup tasks and stagger backup processes so that only one virtual machine is being backed up on any physical system at one time.
There are far more installations when the backup software is installed on every virtual machine, and this can make your backup process far more costly. Also, traditional backups will copy programs and application data but do not necessarily capture the entire virtual machine state. This may be fine if your only goal is to preserve an application, such as a database, but a failed virtual machine may need to be recreated and reconfigured from scratch before the backup can be restored.
Virtualization-specific tools, such as VMware Consolidated Backup (VCB) or Microsoft's Virtual Machine Manager (VMM), interface directly with their respective virtualization platform and capture point-in-time snapshots of the entire VMware's Virtual Machine Disk (VMDK) or Microsoft's Virtual Hard Drive (VHD). Virtual server backup tools like, VCB or Virtual Machine Manager (VMM), can capture the entire virtual machine state quickly, and the virtual machine typically does not need to be quiesced.or taken offline. Not only does this allow for fast, complete system restorations, but complete snapshots can also be uploaded to new virtual machines, allowing system administrators to "clone" virtual servers on demand.
The downside to virtual server files is a potential loss in granularity. With traditional backups, it is easy to restore a single application or data file. When there is one single VMDK or VMM file, you typically have to restore the entire snapshot in order to recover, even if only one file is lost or corrupted. "Some snapshot vendors have figured out how to take that image-level backup and break it down into the granular single files that people need to recover," Whitehouse says, "Not everyone has done that though."
How are virtual server backups implemented?
Storage space poses a particular challenge for virtual machine files. The virtual snapshot is always seen as a new file, so it is backed up in its entirety, regardless of how much data has actually changed since the last snapshot. Snapshots will continue to use the full backup window and consume the same amount of disk/tape space. Data deduplication, also called single-instance storage, can help to reduce these storage demands. Deduplicating at the storage system doesn't shrink the backup window because data still must be transferred across the network prior to deduplication. Experts suggest deduplicating through an appliance or at the source to save backup media while minimizing the backup window.
Virtual server backups have no specific affinity for backup targets. Traditional backups can go to tape, VTL or other disk systems as they do now, though most performance-minded users will back up to some form of disk storage first , then offload the backup to tape later. VCB or VMM backups are almost universally sent to disk, then later replicated to offsite disk storage or sent to tape. Backup media is then retained or stored exactly the same way as conventional backups. However, retention periods should be evaluated carefully; it may not be necessary to save every snapshot for a prolonged period. Consult your local retention experts or legal counsel for their recommendations.
Virtual server backups should also be verified and tested periodically to ensure that the required suite of data has been captured adequately, but this typically involves restoring the backup to another virtual server and verifying normal operation. For some shops that perform frequent restorations, the "testing" process is ongoing; backups are tested each time a file or application needs to be restored. Other virtualized shops have auxiliary machines available for testing purposes, which allows administrators to periodically test backups without taking the original production machines offline.
Who is doing virtual server backups?
For Young America Corp., the customer fulfillment business generates a great deal of customer data. Close to 20 terabytes (TB) of production data and another 10 TB of development and test data is spread across several EMC Corp. platforms running under VMware Inc.'s Infrastructure 3 virtualization software. Virtualization has proven its benefit to the organization. "The No. 1 reason [benefit] is efficient use of resources," says Dan Thompson, network engineer at Young America. "Secondary reasons include ease of backups and disaster recovery."
Thompson backs up virtual machines using VCB operated in concert with EMC's Legato backup software. Virtual server backups are performed nightly along with the entire backup process and are also performed on-demand. The entire backup process takes about 6-to-7 hours each night, but with about 160 servers to contend with, half of them virtual servers, it's difficult to say exactly how long a single virtual machine backup takes.
In addition to protecting existing virtual servers, Thompson also uses virtual snapshots to clone new servers, "You can use VCB to actually save a copy of a virtual machine "hot" then you can restore it to another virtual machine and bring it up as a clone of the first one," he says.
An EMC Clariion Disk Library (CDL) provides virtual tape support. "The backup application backs up to that and also to actual [IBM] tape, so we go to both," Thompson says, noting that the current LTO-3 tape drives will soon be upgraded to LTO-4. Although Thompson has never needed to restore a virtual machine failure, the restoration process has been thoroughly proven and is tested monthly or even more frequently.
Thompson notes that virtualization has proven reliable, since the resolution of some early difficulties. "We had virtual machines lock up when VCB is executed that we attributed to outdated VMware drivers and tools, he says. With that updated, those virtual machines haven't had a problem since." This underscores the importance of software maintenance and version control in the virtual environment.
Next to efficiency, flexibility in integrating infrastructures is probably the most important benefit gained from server virtualization. For information services business Kroll Factual Data, the flexibility afforded by Microsoft Virtual Server 2005 R2 proved critical when integrating data centers. "We were moving an acquired company and their technology infrastructure into our data center, and the virtual environment was really the only way that we could be flexible enough to tackle the integration in a timely manner," says Christopher M. Steffen, manager of information security and compliance.
Once the benefits of storage virtualization became clear, the entire infrastructure was migrated to a virtual server environment, supporting more than 600 virtual machines in production (80%-85% of the production environment). In addition there are about 400 virtual machines in disaster recovery, another 400 virtual machines in development. "It's a hardware-agnostic point of view," Steffen says. "Any platform that runs a Windows server can support full virtualization and really utilize your hardware to its fullest potential." Today, Kroll Factual Data operates about 60 TB of storage on an IBM FAStT storage server.
Steffen uses the VMM utility to manage and back up Microsoft virtual machines. Not only does VMM help to configure and optimize the virtual environment, it also creates backup snapshots of the VHD file. Steffen also uses VMM to create standard server "images" that speed the deployment of new virtual servers, while helping to prove the compliance of software/driver versions across the environment. "Instead of configuring a new server from scratch, which can take two-to-four hours, just take and copy the hardened image that you've already created and patched correctly up to the host machine -- that takes 10-to-15 minutes," he says.
Almost all virtual machine backups are performed through VMM, though there are still some manual backup processes to accommodate mission-critical processes that have not yet been virtualized. The actual time needed to back up a virtual server depends on the size of the VHD file and the bandwidth available to pass the backup data to the target. Backups are always sent to disk first, then offloaded to tape as a separate process.
The ability to configure disaster recovery sites virtually anywhere, where power and Internet access are available, was an important benefit, according to Steffen. "Virtualization makes the whole disaster recovery 'mess' actually something that is manageable," he says. "And VMM helps with configuration management, update migration and so on." VMM provides load-balancing recommendations that can help to optimize the number of virtual machines on each particular server.
What is the future of virtual server backups?
Storage volumes will continue to grow, and this will inevitably lead to a demand for more network storage for virtual machine backups. This will also usher in greater application awareness and data deduplication with virtual server backups. The real challenge will be to implement deduplication without compromising virtual machine performance. "If you run dedupe on a VM, you'll put more workload on the VM [CPU]," Schulz says. In the near term, an external data deduplication appliance may be necessary to achieve necessary performance goals. There are other performance issues with server virtualization that will be increasingly addressed using optimized hardware chipsets, such as Intel Corp's vPro Processor Technology and Q35 Express Chipset.
While conventional backups will rely upon backup software for proper restoration, affording a small amount of native security, virtual machines are complete self-standing system snapshots that are far simpler to restore than a backup volume. Encryption is another component in the virtual backup environment, but few virtualization users have made security a major priority yet.
Ultimately, the future of such tools remains murky. Experts note that virtualization vendors may shift the backup burden to third-party developers. "I think the first step for them [virtualization vendors] would be to create APIs for backup vendors," Whitehouse says, noting that backup vendors could then build new applications or add features to their existing backup products that would utilize those APIs to provide better and more refined backup products.