Gajus - Fotolia

Key choices in virtual machine backup

Virtual machine backup is a priority for many datacentres. We examine the key choices, and whether to use agent-, hypervisor- or array-based backup

Server and desktop virtualisation has swept through datacentres, but as apps and data are virtualised, they must also be protected. That is why virtual machine backup is one of the most pressing tasks facing IT departments today.

But there are no straightforward choices. Backing up virtual machines creates new challenges that require a different approach to the problem of backup and restore.

Traditional physical server backup systems typically use host agents to collect data, managed by a central backup server. This method of backup requires the deployment, configuration and ongoing management of guest agents by a central backup server.

With large numbers of virtual machines, management of agents can become troublesome and result in a lot of wasted time trying to configure short-lived VMs.

By their design, virtual environments lead to consolidation of resources, but this is also their downfall where backup is concerned.

Physical servers provide dedicated processing power and network connectivity to an application, but virtual environments pack many apps running on virtual servers into one physical host. This can cause huge oversubscription to processor, network and storage resources and major bottlenecks when many concurrent backups create high levels of random I/O.

Such contention problems can affect the performance of applications that run on virtual machines, even if they are not those being backed up.

Virtual machine backup methods

Backups of virtual machines can be performed in the traditional way, with one agent per virtual server. As mentioned above, this is not the most efficient solution because it can result in contention at the network and storage layers, but there are some alternatives.

Hypervisor-based VM backup

This method uses features in the hypervisor to synchronise and extract backup data. For example, VMware vSphere environments can make use of VADP (VMware vStorage APIs for data protection) and CBT (changed block tracking), while Hyper-V systems use VSS, the volume shadow copy service. Hypervisor-based solutions can also be used to implement CDP (continuous data protection) methodologies that use CBT, allowing data to be restored at a granular block level.

Hardware-based VM backup

It is possible to take backups through the shared storage infrastructure by making a copy of virtual machine image files. This method is more suited to NFS or SMB-provisioned virtual machines, because the data is easily exportable as a file share snapshot.

Virtual machine backup choices

Choosing the right backup strategy means examining a number of factors that influence the effectiveness of virtual machine backup and restore.

Application consistency

The question here is: how do we ensure that data within the application is consistent?

Agent- and hypervisor-based solutions can communicate with the host and application using agents such as VMware Tools and VSS. These allow the backup software to ask the operating system and application to flush outstanding data to disk and enter a quiesced state while the backup is taken. This process can reduce or eliminate the need to perform additional forward recovery of the application when restoring from a backup.

Backups taken from storage-based snapshots are not application or operating system consistent by default. But some suppliers provide plugins and additional management software that co-ordinates the taking of snapshots in the array while managing the application quiescence process.

Backup object abstraction

Virtual machines are very fluid objects. For example, features such as vSphere Distributed Resource Scheduler (DRS), vMotion, Storage vMotion and Live Migration see virtual machines moved between physical hosts and storage at will.

Backup solutions must be able to back up and restore virtual machines to logical locations rather than physical ones. This can, for example, make recovery that uses hardware-based solutions much less practical than agent- or hypervisor-based ones.

Granular recovery

Data protection is all about the efficiency of the restore process, so a key factor is the level of granularity to which restore can be carried out. In many instances, restoring the entire VM is too cumbersome or time-consuming, especially when only a few files are needed.

Agent-based backups can restore individual files, which means that in mixed physical and virtual environments, the restore process is simplified across the two platforms.

With hypervisor-based backup, virtual machine disks are stored as single large files (VMDKs or VHDs), so extracting data from them means understanding the format of the disk file and providing facilities to access the data.

Some solutions take this a step further and provide an “instant” recovery option that allows the virtual machine to be booted directly from the backup image to extract files for recovery.

More on virtual machine backup


Backups are taken for many purposes, including protection against data corruption, hardware/application failure and user error. In most instances, there is a requirement to take a copy of data offsite to another location as part of a business continuity/disaster recovery strategy.

Hardware-based solutions are typically good at managing this process through replication of the backup data. Agent and hypervisor solutions that write backup data to a backup appliance can also implement this feature.

Virtual appliance

An important consideration is the delivery of the backup software itself. In many instances, deploying a separate physical server for backup may not be the most efficient solution. Many suppliers offer virtual appliances that can run in the virtual server infrastructure.


Many traditional backup systems base their licensing on physical server (or client) count. This is not the most efficient model for virtual environments, where virtual machines may come and go on a regular basis, making the process of tracking licences cost-prohibitive. More efficient cost models are based purely on backup capacity, making cost management much easier.

Physical and virtual: further challenges

The use of physical and virtual servers in the same environment means it can be difficult to find one backup product to fit all requirements because physical and virtual platforms use different backup and restore methods.

This can be a problem for operations staff in large environments unless additional tracking information is available to monitor the backup/restore process.

Imagine an environment with 1,000-plus servers across multiple virtual and physical platforms. If you process a restore request for server X, how do you know which platform it resides on and which backup tool was used to back it up? 

With larger-scale environments, there needs to be a way to identify the server location. This could be via the naming of servers, or the use of a CMDB (configuration management database) that indicates which site, which hypervisor, and so on. 

Alternatively, the backup software could integrate with the hypervisor management tool, so administrators can right-click to show the backup/restore menu within.

Tracking can also be an issue in environments in the middle of a physical-to-virtual (P2V) migration, with the result that two backup regimes have to be maintained, both before and after the migration point, with information retained to record when the transition occurred.

Over time, multiple backup products may have been used, so you need to know which backup platform was used when, because not all restores will occur from the latest backup. Sometimes you need to go to a specific date for a restore and if different products have been used for physical and virtual, you need to know which one to use for the restore.

Virtual machines are also more transient than physical servers. A virtual machine may move between hosts and between storage via features such as VMware vMotion and Storage vMotion. This can make restores harder to manage if the backup software does not easily allow an alternative location to be specified.

Most products use the logical location of the server. For example, in vCenter, you reference the server for backup/restore through the “datacentre” object.

But vCenter integration will not help if you have deleted a server and, in six months' time, need to restore it. Without an external CMDB/reference/documentation, you wouldn't know where it resided and on which backup system.

Virtual machine backup landscape

Traditional backup software suppliers have modified their products to produce VM-aware versions.

Symantec’s Netbackup provides multiple features for virtual environments, including block-based backups using CBT, synchronisation with NetApp filer snapshots, instant VM recovery and physical server recovery to virtual instances. Symantec’s mid-range Backup Exec product comes in a V-Ray edition for virtual environments.

Veeam has been remarkably successful in targeting the small and mid-range market for its backup product suite. Its products provide coverage for vSphere and Hyper-V platforms and have a range of free options.

Microsoft provides Windows Server Backup natively within Windows 2008 onwards and also offers the standalone Data Protection Manager product, which covers application backups as well as virtual servers and Windows desktops.

There are also products from a range of other companies, including Acronis, EMC, PHD Virtual, Quest (owned by Dell), Falconstor, Commvault and Zerto, which we will cover in more depth in a separate article.

Read more on Data protection, backup and archiving