Disk-based backup overview

Implementing a backup scheme that is fast, safe and cost-effective remains a serious challenge for storage administrators. Here's how to do it

The best way to protect valuable corporate data is to create one or more copies of that data -- backups . However, the burden of implementing a backup scheme that is fast, safe and cost-effective remains a serious challenge for storage administrators. Traditional tape systems simply do not meet the demands of today's busy datacentres, and administrators are turning to disc-based storage for data protection, backup tasks and archival storage.

Disc backup systems

Backups are all about time. It can take several hours to back up a large server to tape. Backing up an entire data center to tape can take 12 hours or more. Since most network data is inaccessible during a backup process, the procedure is typically performed in the evenings or off-peak hours to minimise service interruptions to end users.

With hard disc costs falling, administrators realise that disc-based storage can accomplish the same backup tasks in just a fraction of the time needed for tape, while being cost competitive with tape systems. Not only does this reduce the backup window, but those same backup discs can also speed recovery times (RTO). Once data is on a disc-based backup system, RAID techniques ensure data integrity and prevent data loss in the face of disc failures. These are significant benefits for busy organisations that rely on 24/7 network operations.

Storage All-In-One Guides
Learn more about storage topics like disc storage, disaster recovery, NAS, and more.
disc backup systems can be implemented as a modular NAS, such as EMC Corp.'s NS series, or as an expandable SAN platform, like EMC's Clariion CX700. Sun Microsystems Inc. and Hitachi Data Systems Inc. are other recognised providers of disc storage systems. In addition to hardware implementation, administrators must also consider backup software and storage management tools. For example, EMC's Clariion includes the Navisphere Management Suite for storage management, and configuration and monitoring, in addition to snapshot, mirroring and replication software. It's important to consider the availability and cost of software when selecting a disc storage system [see the SearchStorage.com AIOG on NAS].

Disc vs. tape

Making an argument for tape is becoming increasingly difficult. Tape is a mature and relatively inexpensive offline storage technology, but it does have limitations. Tape is generally slow, so large backup and restore operations can take hours -- even days. Searches are often impractical unless the user knows exactly which tapes and file names are required -- difficulties that increase dramatically when there are hundreds or thousands of tape cartridges to contend with. The media itself is vulnerable to loss or theft, especially when tapes must be transported to off-site storage.

Disc-based storage has emerged as a cost-effective alternative to tape, allowing companies to retain huge amounts of data on inexpensive SATA or SAS disc arrays. Disc's superior performance supports quick backups and restorations, and improves the user service level by making corporate data available "nearline." Disc arrays also leverage RAID techniques to maintain data integrity -- if one disc fails, the data on that disc can be rebuilt to a spare drive. Although disc systems often cost more than tape libraries, users note that the total cost of ownership is generally comparable to that of tape libraries. One notable disadvantage to disc is that ordinary media (the disc drives) cannot be removed and shipped to off site storage unless the storage system employs specially designed removable disc drives. When ordinary discs are employed as a local backup target, the data is usually replicated off site across a WAN to ensure disaster recovery or business continuance.

Virtual tape libraries

Although tape systems are hard pressed to meet shrinking backup and restore windows, many companies have a significant investment in tape-oriented backup software and management tools. Rather than discarding the tape paradigm and re-architecting a backup system from scratch, administrators often deploy virtual tape libraries (VTLs) as a more convenient solution. A VTL stores data on hard disc, but it appears to be a conventional tape system to backup software and hierarchal storage management (HSM) tools. Administrators can create "virtual tape drives" and segment virtual "tape cartridges" in the disc space, allowing the VTL to mimic a tape library in every way. This gives VTLs far better performance than true tape libraries, but provides easier integration with existing storage infrastructures with a minimum learning curve. IBM, Sun (StorageTek brand) and FalconStor Stofware Inc. are three recognised VTL vendors.

Storage Learning On-The-Go
Download this overview and listen on your iPod or laptop.
Eliminating tape media prevents tape loss or damage, but many VTL systems include support for more traditional tape hardware – an existing tape library can be attached to the VTL rather than discarded. This allows storage management software to relegate aged or unnecessary data from VTL hard disc to tape for long-term, low-priority archiving if desired (e.g., using a VTL as a secondary storage tier and supporting tape as a third tier). In cases where administrators need to balance archival tape and disc performance, a VTL may be used to maintain corporate data on faster disc, but periodically perform complete backups to tape without interrupting normal network behavior.

Emerging backup technologies

Disc storage is being deployed in a variety of advanced backup tasks. Disc arrays are implementing data deduplication (a.k.a. intelligent compression or single-instance storage) for backup targets and archiving platforms such as content addressed storage (CAS). Single instance storage is notable because only one copy of a given file is ever retained -- even when multiple copies of a file or attachment are being backed up. Deduplication can offer a disc space savings up to 50-to-1. Many CAS archiving systems also include fingerprinting technologies that ensure files are unmodified, which is beneficial for electronic discovery and other corporate governance needs [see the SearchStorage.com article on CAS].

Discs are also providing improved backups using snapshot and continuous data protection (CDP) techniques to lower recovery times. Snapshots are periodic "saves" of system data, which can be initiated several times each day -- even once an hour. System administrators can recover a system from its latest snapshot on disc. CDP offers more granularity; recording each I/O operation into a disc record, and allowing administrators to recover a system right down to the last read or write with virtually no data loss. All of these technologies are impossible with tape.

Read more on Integration software and middleware