Using data deduplication for virtual machine backup

Deduplication can reduce data volumes during virtual machine backup. But you'll need a VM backup strategy to protect data dedupe indexes and complete your data restoration.

Can I use data deduplication when backing up virtual machines? What considerations must I take into account regarding data restore and so on?

Data deduplication can cut down the amount of data you need to store when performing a virtual machine backup, but there are some key challenges to be aware of.

Unlike physical servers and workstations, backing up a virtual machine system partition is relatively easy. But backing up all those OS partitions typically leads to an increase in the volume of data being committed to tape or disk, as well as the time required for the backup window. This is where data deduplication can come to the rescue by reducing data volumes during backup. But you should be aware that using data deduplication to back up virtual machines can have a very varied success rate depending on the following factors:

  • Variations in operating systems: If an organisation has a diverse range of operating systems across their virtual machines, all with significantly different configurations, then data reduction levels will fall significantly. If, however, an organisation has only one operating system with a highly standardised configuration, then between 40:1 and 60:1 data reduction can be achieved.
  • Data changes: As with a physical server or workstation, a virtual machine will have a wide range of roles and data usage profiles. If the data held within the virtual disks has a high rate of change, then deduplication is unlikely to provide significant levels of data reduction. In certain situations, a successful strategy would be to subject only the operating system volumes to deduplication and to leave the data volumes to more traditional methods.

Getting the most out of using data deduplication within a virtualised environment relies on the organisation carefully planning its virtual machine backup strategy and storage configuration. By profiling virtual machine data change rates and limiting the infrastructure to a maximum of one or two standardised operating systems, organisations will be able to reduce the cost of the physical backup infrastructure and the length of backup windows.

When it comes to data restoration, you can rely on backup copies made from data that has been deduplicated. Restoration times for entire virtual machines, including the operating system partition, are likely to improve significantly.

But it is vital to protect your data deduplication indexes and to restore everything in the right order. When performing a standard restore, all data is found in a linear fashion on your tapes or disk. When data deduplication is involved, the non-unique data is referenced by pointers and the indexing database locates the necessary data.

Regardless of whether your backup data is deduplicated or not, a carefully constructed backup cycle needs to be maintained. You will not necessarily need to take a full weekly backup, but retention policies need to be set so that there is always a copy from which to recover.


Read more on Data protection, backup and archiving