Disk-based backup offers faster backups and restores than tape, while eliminating many of the headaches that come with the storage and transport of tape media. When you combine the cost per gigabyte of disk and the advantages of data deduplication, disk-based backup becomes a compelling proposition for storage administrators. Furthermore, in a storage architecture, disk-based backups can form a nearline tier which matches data by age and usefulness to the cost of media.
Tape vs. disk as a backup target
Traditionally backup has been carried out directly to tape media, but this can take a long time and can result in underutilisation if the backup process is not streaming data to tape at optimum speeds. Also, restoring data from tape can be time-consuming, with at least three-minute tape mount and two-minute first data block seek times (or longer if the data has been sent off-site).
Data restores can also be unreliable with tape, because of the fragility of tape and also because data searches are often impossible unless the user knows the tape and file names required. Loss and theft are also problems commonly associated with tape.
Advantages of disk-based backup
Compared to tape, disk-based backup appliances allow administrators to perform backups more quickly and efficiently over the wire, as well as to restore data far more rapidly than from tape. With disk systems, data integrity is catered for by RAID protection.
With the prices of SATA and SAS drives decreasing, cost per gigabyte for disk has begun to near that of tape. Disk-based backup can also boost utilisation when compared to tape because the backup stream never fails to keep up with the high throughputs afforded by tape. The media will never be transported off-site partially full, as happens with tape.
Disk-based backup products allow users to stage data to disk before being run off to tape after a set period in a disk-to-disk-to-tape (D2D2T) configuration. This makes restores from backup available almost instantly. Disk-based backup can be a core component in a storage architecture, by matching the value and frequency of use of data to the cost of tape media with gradations of online, nearline and archived information.
Disk-based backup methods and products
Disk-based backup can be carried out with standard or intelligent disk products, using NAS virtual tape libraries (VTLs) as targets.
Using standard disk – either standalone or using SAN drives provisioned for the task -- entails dedicating disk volumes as backup targets. The standard disk approach has the advantages of disk over tape but there are drawbacks compared to intelligent NAS or VTL devices. You will need to provision volumes for each backup server and as your environment changes and grows, so will the amount of management overhead. Plain disk-based products without data deduplication will also work out between five to 10 times more expensive than those with.
Intelligent disk-based backup products include NAS and VTL, as well as those featuring data deduplication and thin provisioning.
Virtual tape libraries
VTLs emulate a tape library – your backup software sees disk space represented as virtual tape cartridges in a library. Data is then run off at user-determined intervals to physical tape. A VTL configuration allows businesses to repurpose their tape libraries to form a more cost-effective tier in their storage system as archive repositories to which aged or unnecessary data is relegated. VTLs can also be shared between backup products with the device represented as multiple backup targets to the backup software. Thin provisioning can also give you the jump on capacity issues.
NAS as a backup target
NAS is file-level disk storage hardware which sits on the LAN and interoperates with common file systems like NFS and CIFS, appearing as a giant volume to which the backup software writes.
NAS or VTL?
Whether you opt for a VTL or NAS device as a disk-based backup target will depend on your environment and how much data you are handling. Businesses with smaller volumes of data that are using dedicated Fibre Channel or iSCSI SANs and tape libraries will be more suited to VTLs, because VTLs are optimised to store block-based data in LAN-free backup environments with the explicit intention of off-loading to tape. On the other hand, NAS devices are optimised as file storage devices with finite storage capacity, and while users will eventually run data off to tape as they become full, they are not optimised to carry out that task on a day-to-day basis.
Data deduplication has revolutionized disk-based backup. By eliminating redundant data in backup streams, data deduplication can often reduce data by ratios as high as 50:1.
Data deduplication works by applying an algorithm to data streams which strip out duplicated blocks and mark the single iteration retained with a pointer. For this reason, the longer data deduplication can be applied to on your backups the more it can reduce the amount of data. Data deduplication works best with data types in which there are many repeated blocks. For example, backed-up databases will tend to achieve far higher reduction ratios than many different image files.
For more information on data deduplication see the SearchStorageUK special report on data deduplication.
This was first published in November 2008