If you want to back up virtual machines, the options may seem confusing. Methods used to back up virtual machines have evolved rapidly over the past couple of years, from an analogue of traditional backup, via an awkward two-stage process, to today’s backup apps that have been re-engineered for the virtual server age. But, what challenges arise with the backing up virtual servers, and why might you want to dig deep into the settings of your backup app to avoid crash-consistent database backups?
In this interview, SearchStorage.co.UK Bureau Chief Antony Adshead speaks with David Boyd, practice lead for backup and recovery with GlassHouse Technologies (UK), about the difference between backing up virtual servers vs physical servers; image-level backups of VMs; and features such as data deduplication, white-space recognition and active block mapping that are included in current products intended to back up virtual machines.
You can read the transcript below or download the podcast on virtual machine backup.
SearchStorage.co.UK: What are the differences between backing up virtual servers and physical servers?
Boyd: There are numerous … ways to back up a physical machine, but the typical approach involves an agent running on the [backup] client that reads the source data and copies it across an IP network to a backup server. The backup server in turn then transfers that backup data to the backup target. The location of each file within that backup is recorded into a database for restoration purposes.
The same approach can be used for backing up a virtual machine. In fact, it offers, from a conceptual point of view, the simplest approach. Each virtual machine can have a backup agent installed, which would read its own data and transfer that data over the network to the backup server. The backup software would have no knowledge whether the server is virtual rather than physical.
However, there are serious challenges to using this approach. The backup of a virtual machine in this way puts a load on the hypervisor in terms of CPU and I/O. A backup administrator may not know on which physical machine a virtual machine resides and therefore runs the risk of scheduling several backups from the same physical host, compounding the problem and severely impacting the performance of all VMs. Furthermore, if you pay for backup software licenses on a per-client basis, you will have to fork out for each virtual machine.
To get around these challenges, image-level backups were devised. When viewed from the underlying storage, a virtual machine is simply a large file -- very large in some cases. Backing up these underlying files takes the CPU load away from the virtual machine. In this approach, the virtual machine needs to be momentarily quiesced, all pending writes written and then a snapshot created. It is this snapshot that gets backed up. While this approach removes the load associated with agent backups, they are totally ignorant of the data within the virtual machine, and so data, deleted files and white space all get backed up.
To get around these problems and minimise the volume of data being backed up, solutions [have been introduced that] include data deduplication; white-space recognition; and active block mapping, where blocks that contain deleted files have been excluded from the backup. And probably most importantly, changed block tracking has allowed backup vendors to perform block-level incremental backups of virtual machine images.
SearchStorage.co.UK: What are the challenges arising from the spread of virtual machines in the data centre?
Boyd: When all servers were physical, it was easy for the backup administrator to tap into the new server provisioning process and configure a backup. One of the biggest challenges now is that a new host can take only a few minutes to be created and can be done with little or no involvement by anyone outside the team that manages the virtual infrastructure.
Therefore, the opportunity for new data to be omitted from the backup schedule has never been greater. In response, several backup vendors have built in auditing modules to highlight when new VMs have been created or where VMs are being excluded from the nightly backup. But, as always with backup, rapid increases in the number of backup clients and volume of data need to be carefully managed.
IT managers have witnessed considerable financial benefits from virtualisation, and as the technologies have continued to mature, more enterprise-class applications are being hosted on virtual servers. It is now common to see databases and email applications being hosted on virtual machines. This represents a major challenge to the backup administrator, who has probably been running agent-based backups of properly quiesced databases that offered granular restore capabilities and who wants to move away from using agent-level backups when backing up VMs.
Typically, Virtual Shadow Copy Service (VSS) is used to quiesce the virtual machine, but the backup administrator needs to know what VSS writer is being used and what it is queiscing. If it is just the virtual machine operating system being quiesced, then the backup of any database running there will only be crash-consistent.
If taking crash-consistent backups is a risk you are willing to take, then all well and good. If not, then you need to start digging deeper into your configuration and understand how your backup application interacts with the hypervisor you are using. When moving database and mail servers from physical to virtual, your customers will not want a reduced service and so being able to perform granular (for example, mail-level) restores from image-level backups might be something you want to consider. Some backup vendors are starting to offer this level of service, but at the moment it is not available across the board.
SearchStorage.co.UK: What new products, methodologies and technologies help in backing up virtual servers?
Boyd: Virtualisation is all about consolidation and, from a backup point of view, the major challenge with consolidation is there being more data in one place. The key technologies that help with the backup of virtual machines are the ability to create off-host snapshots and deduplicate your backups.
Data deduplication is vital if you are to move large amounts of data from source to target within the ever-shrinking backup window. Coupled with changed block tracking and block-level incremental backups, data deduplication allows the backup administrator to move considerable volumes of data to backup storage in increasingly small times.
Some of the larger backup vendors have been slow to adapt to the huge rise in virtualisation, and a few purpose-built products are now available offering unique solutions that tightly integrate with the hypervisor.
No one wants data protection to be the limiting factor in what can and can’t be virtualised so going forward I think backup vendors will focus on tighter integration with the hypervisor, and look to develop efficient mechanisms for getting consistent backups and granular restores of virtualised enterprise-class applications and databases.
This was first published in May 2011