As
server virtualization assumes a greater role in the enterprise,
administrators face a proliferation of virtual machines residing on
the same physical server. Each virtual machine uses a portion of
the physical machine's processing, memory and I/O resources.
Ideally, server virtualization provides a means of increasing
hardware utilization. But as more "logical" servers are
consolidated into fewer "physical" computer systems, it's important
to protect each virtual machine's data against failure or loss.
Virtual server
backups are the key to providing this
protection. This article examines how virtual server backup can
be achieved using a mix of traditional backup techniques and
specialized virtualization tools, highlights important
deployment issues and looks at several real-world users.
What is virtual server backup?
A virtual machine is a complete logical environment existing as
a separate entity on a physical server. Each virtual machine is
treated and perceived as if it is physical. In fact, a user cannot
tell the difference between a real and virtual machine. A data
center may host thousands of virtual machines running on only a
fraction of that much hardware, and this presents a serious problem
for storage or backup administrators. Data loss on a virtual server
can be just as catastrophic as data loss on a physical server, so
every virtual server must be backed up as part of a company's
backup regimen.
Virtual server backups can be accomplished using a traditional
approach with conventional backup software. The backup software is
simply installed and configured on each virtual machine, and
backups will run normally to any conventional backup target,
including tape drives, virtual tape libraries (VTL) or disk
storage. "That's probably the most popular way that people do it
today because it's familiar," says Lauren Whitehouse, analyst with
the Enterprise Strategy Group (ESG). "It ensures a consistent
backup; it will give you the granular recovery that you're looking
for, and it's application-specific."
However, applying traditional backup tactics to virtual server
backups does have drawbacks. The most significant problem is
resource contention. Backups demand significant processing power,
and the added resources needed to execute a backup may compromise
the performance of that virtual machine and all virtual machines
running on the system. "Don't go for 100% utilization," says Greg
Schulz, founder and senior analyst at the Storage IO Group. Leave
some server resources unused to accommodate backup tasks and
stagger backup processes so that only one virtual machine is being
backed up on any physical system at one time.
There are far more installations when the backup software is
installed on every virtual machine, and this can make your backup
process far more costly. Also, traditional backups will copy
programs and application data but do not necessarily capture the
entire virtual machine state. This may be fine if your only goal is
to preserve an application, such as a database, but a failed
virtual machine may need to be recreated and reconfigured from
scratch before the backup can be restored.
Virtualization-specific tools, such as VMware Consolidated
Backup (VCB) or Microsoft's Virtual Machine Manager (VMM),
interface directly with their respective virtualization platform
and capture point-in-time snapshots of the entire VMware's Virtual
Machine Disk (VMDK) or Microsoft's Virtual Hard Drive (VHD).
Virtual server backup tools like, VCB or Virtual Machine Manager
(VMM), can capture the entire virtual machine state quickly, and
the virtual machine typically does not need to be quiesced.or taken
offline. Not only does this allow for fast, complete system
restorations, but complete snapshots can also be uploaded to new
virtual machines, allowing system administrators to "clone" virtual
servers on demand.
The downside to virtual server files is a potential loss in
granularity. With traditional backups, it is easy to restore a
single application or data file. When there is one single VMDK or
VMM file, you typically have to restore the entire snapshot in
order to recover, even if only one file is lost or corrupted. "Some
snapshot vendors have figured out how to take that image-level
backup and break it down into the granular single files that people
need to recover," Whitehouse says, "Not everyone has done that
though."
How are virtual server backups implemented?
Storage space poses a particular challenge for virtual machine
files. The virtual snapshot is always seen as a new file, so it is
backed up in its entirety, regardless of how much data has actually
changed since the last snapshot. Snapshots will continue to use the
full backup window and consume the same amount of disk/tape space.
Data deduplication, also called single-instance storage, can help
to reduce these storage demands. Deduplicating at the storage
system doesn't shrink the backup window because data still must be
transferred across the network prior to deduplication. Experts
suggest deduplicating through an appliance or at the source to save
backup media while minimizing the backup window.
Virtual server backups have no specific affinity for backup
targets. Traditional backups can go to tape, VTL or other disk
systems as they do now, though most performance-minded users will
back up to some form of disk storage first , then offload the
backup to tape later. VCB or VMM backups are almost universally
sent to disk, then later replicated to offsite disk storage or sent
to tape. Backup media is then retained or stored exactly the same
way as conventional backups. However, retention periods should be
evaluated carefully; it may not be necessary to save every snapshot
for a prolonged period. Consult your local retention experts or
legal counsel for their recommendations.
Virtual server backups should also be verified and tested
periodically to ensure that the required suite of data has been
captured adequately, but this typically involves restoring the
backup to another virtual server and verifying normal operation.
For some shops that perform frequent restorations, the "testing"
process is ongoing; backups are tested each time a file or
application needs to be restored. Other virtualized shops have
auxiliary machines available for testing purposes, which allows
administrators to periodically test backups without taking the
original production machines offline.
Who is doing virtual server backups?
For Young America Corp., the customer fulfillment business
generates a great deal of customer data. Close to 20 terabytes (TB)
of production data and another 10 TB of development and test data
is spread across several EMC Corp. platforms running under VMware
Inc.'s Infrastructure 3 virtualization software. Virtualization has
proven its benefit to the organization. "The No. 1 reason [benefit]
is efficient use of resources," says Dan Thompson, network engineer
at Young America. "Secondary reasons include ease of backups and
disaster recovery."
Thompson backs up virtual machines using VCB operated in concert
with EMC's Legato backup software. Virtual server backups are
performed nightly along with the entire backup process and are also
performed on-demand. The entire backup process takes about 6-to-7
hours each night, but with about 160 servers to contend with, half
of them virtual servers, it's difficult to say exactly how long a
single virtual machine backup takes.
In addition to protecting existing virtual servers, Thompson
also uses virtual snapshots to clone new servers, "You can use VCB
to actually save a copy of a virtual machine "hot" then you can
restore it to another virtual machine and bring it up as a clone of
the first one," he says.
An EMC Clariion Disk Library (CDL) provides virtual tape
support. "The backup application backs up to that and also to
actual [IBM] tape, so we go to both," Thompson says, noting that
the current LTO-3 tape drives will soon be upgraded to LTO-4.
Although Thompson has never needed to restore a virtual machine
failure, the restoration process has been thoroughly proven and is
tested monthly or even more frequently.
Thompson notes that virtualization has proven reliable, since
the resolution of some early difficulties. "We had virtual machines
lock up when VCB is executed that we attributed to outdated VMware
drivers and tools, he says. With that updated, those virtual
machines haven't had a problem since." This underscores the
importance of software maintenance and version control in the
virtual environment.
Next to efficiency, flexibility in integrating infrastructures
is probably the most important benefit gained from server
virtualization. For information services business Kroll Factual
Data, the flexibility afforded by Microsoft Virtual Server 2005 R2
proved critical when integrating data centers. "We were moving an
acquired company and their technology infrastructure into our data
center, and the virtual environment was really the only way that we
could be flexible enough to tackle the integration in a timely
manner," says Christopher M. Steffen, manager of information
security and compliance.
Once the benefits of storage virtualization became clear, the
entire infrastructure was migrated to a virtual server environment,
supporting more than 600 virtual machines in production (80%-85% of
the production environment). In addition there are about 400
virtual machines in disaster recovery, another 400 virtual machines
in development. "It's a hardware-agnostic point of view," Steffen
says. "Any platform that runs a Windows server can support full
virtualization and really utilize your hardware to its fullest
potential." Today, Kroll Factual Data operates about 60 TB of
storage on an IBM FAStT storage server.
Steffen uses the VMM utility to manage and back up Microsoft
virtual machines. Not only does VMM help to configure and optimize
the virtual environment, it also creates backup snapshots of the
VHD file. Steffen also uses VMM to create standard server "images"
that speed the deployment of new virtual servers, while helping to
prove the compliance of software/driver versions across the
environment. "Instead of configuring a new server from scratch,
which can take two-to-four hours, just take and copy the hardened
image that you've already created and patched correctly up to the
host machine -- that takes 10-to-15 minutes," he says.
Almost all virtual machine backups are performed through VMM,
though there are still some manual backup processes to accommodate
mission-critical processes that have not yet been virtualized. The
actual time needed to back up a virtual server depends on the size
of the VHD file and the bandwidth available to pass the backup data
to the target. Backups are always sent to disk first, then
offloaded to tape as a separate process.
The ability to configure disaster recovery sites virtually
anywhere, where power and Internet access are available, was an
important benefit, according to Steffen. "Virtualization makes the
whole disaster recovery 'mess' actually something that is
manageable," he says. "And VMM helps with configuration management,
update migration and so on." VMM provides load-balancing
recommendations that can help to optimize the number of virtual
machines on each particular server.
What is the future of virtual server backups?
Storage volumes will continue to grow, and this will inevitably
lead to a demand for more network storage for virtual machine
backups. This will also usher in greater application awareness and
data deduplication with virtual server backups. The real challenge
will be to implement deduplication without compromising virtual
machine performance. "If you run dedupe on a VM, you'll put more
workload on the VM [CPU]," Schulz says. In the near term, an
external data deduplication appliance may be necessary to achieve
necessary performance goals. There are other performance issues
with server virtualization that will be increasingly addressed
using optimized hardware chipsets, such as Intel Corp's vPro
Processor Technology and Q35 Express Chipset.
While conventional backups will rely upon backup software for
proper restoration, affording a small amount of native security,
virtual machines are complete self-standing system snapshots that
are far simpler to restore than a backup volume. Encryption is
another component in the virtual backup environment, but few
virtualization users have made security a major priority yet.
Ultimately, the future of such tools remains murky. Experts note
that virtualization vendors may shift the backup burden to
third-party developers. "I think the first step for them
[virtualization vendors] would be to create APIs for backup
vendors," Whitehouse says, noting that backup vendors could then
build new applications or add features to their existing backup
products that would utilize those APIs to provide better and more
refined backup products.