Snapshots are a well-known and mature technology used for data protection, historically found in storage arrays.
Having two places in which to protect data via snapshots raises an obvious question: what is the best location to perform a snapshot and what are the pros and cons of each?
A snapshot is a point-in-time copy of data that represents an image of a volume or a LUN (logical unit number) that can be used as a backup and for data recovery. There is continual debate within the industry as to whether a snapshot is a true backup, because an individual snapshot depends on the source volume from which it derives, and so does not protect against hardware failure.
However, a snapshot can be used to recover anything from individual files to an entire virtual machine or application server.
Snapshots work by manipulating the metadata that is used to map a logical LUN or volume to its physical location on disk or flash. A logical volume will typically be divided into blocks from 4KB upwards in size. The snapshot process copies these metadata pointers, allowing a snapshot to represent a point-in-time copy of the volume.
Snapshots fall into three types:
- Changed-block snapshots. These implementations fall into two categories: copy-on-write and redirect-on-write. A copy-on-write snapshot maintains a snapshot image by copying updates to the volume (made after the snapshot is taken) to another location, typically a dedicated snapshot area. Volume updates are made “in place”, updating the same physical disk location. Redirect-on-write snapshots direct updates to a block within a volume to unused space on disk. Updates are always written to free space.
- Clones. This implementation copies the entire volume to new physical space on disk. Although the snapshot is expensive in terms of the additional space required (and the overhead of moving the data), a clone does provide a degree of physical protection when copied to a separate set of physical media.
- CDP. Continuous data protection is a different approach to protecting data that tracks all updates to a volume. Theoretically, this means a volume can be reverted to any point in time, usually at the level of individual block updates. CDP systems can be expensive in terms of additional disk space, but do provide a high level of granularity on restores.
Snapshots in the hypervisor
Hypervisor-based snapshots provide a way to take an image copy of a virtual machine (VM), either to access and restore individual files, provide a rollback point to restore the VM, or to clone the VM to another virtual machine. VMs are simply files (VMDKs in the case of VMware vSphere and VHD files on Microsoft Hyper-V), which means creating and managing snapshots is a case of manipulating these image files.
Both vSphere and Hyper-V manage snapshots by using secondary files associated with the VMDK/VHD to store updates to a VM after a snapshot is taken. These updates accumulate until the snapshot is deleted, at which time the secondary files are integrated back into the original VMDK/VHD.
Snapshots in the storage array
As already mentioned, snapshots in the storage array are managed by manipulating metadata used to track the logical-to-physical relationship of LUNs/volumes to data on disk. When a snapshot copy is taken, the array replicates the metadata that maps the physical layout on disk/flash. At this point, one or more snapshots could reference the same physical data on disk.
As the source volume continues to be updated, changed blocks are either moved out or written to new free space, depending on the snapshot technique. When a snapshot is no longer required, the metadata is simply deleted and unique blocks “owned” by the snapshot are released.
Pros and cons
Array-based snapshots are typically very quick to take, as they are simply a copy of metadata, usually stored in memory, but there can be a small impact on I/O performance while the copy process executes. The number of supported snapshots varies by platform, with some suppliers providing support for thousands of snapshots per system. Most suppliers offer advanced scheduling to automate the snapshot process.
An array-based snapshot is a copy of the image of a running VM or application server at a specific point in time and, as a result, the snapshot will appear as a “crash copy” of that VM/application if it is fully restored and accessed. Remember also that snapshots on the array are based on a LUN or volume (which, in turn, will map to a datastore in the hypervisor).
Read more on snapshots and data protection
- Best-practice data protection strategy combines backup with snapshots, CDP and replication for different levels of recovery.
- Virtual machine backup is a vital task for IT departments, but pitfalls abound. We look at the top five issues.
This means that array-based snapshots may contain many VMs, making it difficult to build schedules around protecting individual virtual machines. This is expected to change with the introduction of VVOLs.
Hypervisor-based snapshots, on the other hand, operate at the VM level, allowing a snapshot policy to be applied to each VM individually. Also, where integration tools have been deployed to a VM, the snapshot process can be synchronised with quiescence or suspension of I/O at the VM/application level, to provide a more consistent image rather than a “crash copy”.
The disadvantage of using hypervisor-based snapshots is in the overhead of writing to separate VMDK files and integrating those updates back when the snapshot is deleted. This process can be time-consuming and have a direct impact on performance.
Choosing the right approach
Hypervisor-based snapshots are a good choice where application consistency is essential, and are the only choice if the underlying storage platform has no snapshot support. The hypervisor-based approach is more efficient where datastores are built from large LUNs because there is no additional data retained, as there is with array-based copies.
But array-based solutions have the performance edge and, as a result, one solution used by backup suppliers is to apply a combination of hypervisor- and array-based snapshots at the same time.
The process works by initiating a hypervisor snapshot to suspend I/O for consistency, followed by taking an array-based snapshot for flexibility/performance. The hypervisor snapshot can then be released almost immediately, resulting in very little data to reintegrate into the VMDK.
This solution gives the best of both worlds – data integrity with the flexibility and performance of hardware-based protection.