This data backup guide provides an overview of data protection and backup products. Topics covered include new directions in data backup products, data replication products, continuous data protection (CDP), snapshots and backup product features. A listing of backup and data protection vendors is also included. While not a comprehensive survey of specific backup and data protection products, this data backup guide does provide an overview of the types of utility available and their features so prospective buyers can decide what combination of products suits their needs.
Table of contents
Backup and replication as complementary tools in the toolbox
New directions in backup products
Continuous data protection (CDP)
Backup product features
Backup and data protection vendors
D ata protection and backup products are designed to effect the duplication of data from production systems to secondary media in different ways. Reasons for doing this vary from the need to retain data because of business or legal constraints, to the necessity of copying data so it can be restored in case of technical failure in production systems.
Backup is usually taken to mean the copying of selected data, applications and system files to secondary media such as a tape library or, increasingly, disk systems. Backups are often local, but can also involve sending data to another physical storage facility across a wide-area network (WAN).
Backups are carried out using backup software that organises the flow of data from its sources to a backup target media. It also allows technical staff to decide what's backed up to where, and to monitor the progress and results of backups.
Backups often make copies of data in a proprietary format, which means backup software must first restore data before it can be accessed. Traditional backup is therefore most suitable for protecting large volumes of data or entire systems to which immediate access isn't necessarily required.
The main enterprise backup products are EMC Corp.'s NetWorker, IBM Corp.'s Tivoli Storage Manager (TSM) and Symantec Corp.'s Veritas NetBackup; however, other vendors have carved out niches that address specific needs such as remote-office and mobile workstation backup.
In recent years, other forms of data protection have arisen that make direct full and incremental copies of files in their native format to secondary disk media. They vary in terms of how often and in what way they make those copies -- from synchronous and asynchronous replication, to methods such as continuous data protection, which incrementally registers changed data blocks. Making direct copies of data in native format means file retrieval is usually a lot quicker and simpler than with traditional backups.
Backup software has traditionally been tightly integrated with tape as a target, but that's changing as disk becomes more prevalent and new requirements arise, such as closer integration with numerous applications and sites running multiple hardware and software platforms.
Atempo Inc.'s Time Navigator, for example, provides support for storage-area network (SAN) and network-attached storage (NAS) environments, encryption and hierarchical key management. Among products aimed at small- to medium-sized enterprises (SMEs), Symantec's Backup Exec 10d can manage the staging of data to disk before moving it to tape.
The heterogeneity of environments and targets may mean users have to deploy several products, with each one dedicated to specific needs.
For example, shops with NetApp Inc. filers might need to add a backup utility that complements the big three backup products to overcome the inability of NetApp's SnapVault snapshot feature to quickly execute snapshots on dense file systems. Syncsort Inc.'s Backup Express, for example, might better satisfy snapshot requirements in file systems that contain millions of small files.
Remote-office support is another instance where users may want to consider complementing their main backup product with another, such as Yosemite Technologies Inc.'s FileKeeper (now owned by Barracuda Networks Inc.).
A n increasing number of tools support data protection features such as synchronous and asynchronous replication, CDP and snapshots, which are becoming increasingly mainstream.
Many disk storage platforms include applications to support replication-type features. EMC's Clariion CX3 Model 80 includes SnapView software for local replication and MirrorView software for remote replication.
There are three fundamental approaches to remote replication: host based, array based and fabric based.
A host-based approach runs software on a server or dedicated appliance to manage the transfer of data across the WAN to a target system. One example is EMC RecoverPoint, which connects directly into the SAN.
Host-based replication is usually the least-expensive option, but it can fall short of the performance of other approaches. With this approach, remote replication is carried out between storage arrays using application software that's bundled with the arrays themselves.
This method was often restricted to the use of identical arrays -- Symmetrix to Symmetrix, for example -- but this is changing as replication software begins to support greater hardware heterogeneity. The EMC Clariion AX150 array, for example, ships with EMC's SAN Copy software that delivers remote point-in-time data copies between Clariion, Symmetrix, IBM, Sun Microsystems Inc. and Hitachi Data Systems arrays.
Remote data replication is now appearing in the fabric, usually in the form of software that runs on SAN switches. NetApp ReplicatorX (formerly the Topio Data Protection Suite) is one example. EMC and FalconStor Software Inc. offer similar products. The appeal of replication in the fabric is that it supports a broad range of devices with no major performance impact.
Remote data replication can be synchronous or asynchronous. There are pros and cons to each approach.
Synchronous replication is real-time replication in which data is sent from primary storage to secondary disk before that transfer is acknowledged. Because the remote disk has to catch up with the local disk, acknowledgement latency limits synchronous replication to shorter distances. WAN interruptions can also severely affect synchronous replication.
With asynchronous replication, data is passed from primary to secondary media only after acknowledgement is received. The local disk then passes data to the remote disk as time and bandwidth allow. This means the replicated disk content will lag behind local data but, because it's not attempting real-time mirroring, asynchronous replication works well over long distances; in addition, bandwidth and WAN issues are less of an issue than with synchronous replication.
C ontinuous data protection is a variation on replication in which data is backed up whenever a change is made. CDP effectively creates a journal of snapshots, with one snapshot generated every time data modification occurs.
A key advantage of CDP is that it preserves a record of every transaction that occurs. If the system becomes infected or corrupted, and the problem is discovered some time later, you're able to recover the most recent clean copy of the file. CDP in combination with disk offers recovery in seconds.
A snapshot is made up of reference markers, or pointers, to data stored on disk or possibly tape. In effect, it's a detailed table of contents, but is presented to the user as a complete backup. Snapshots speed access to stored data and can offer quick recovery. There are two main types of snapshot: copy-on-write and split-mirror. Products are available that can automatically generate either type.
A copy-on-write snapshot creates a snapshot of changes to stored data every time data is updated. This allows rapid recovery of data if needed, but all previous snapshots must be available if recovery of all data is required.
A split-mirror snapshot references all of the data on a set of mirrored drives; every time the tool is run, a snapshot is created of the entire volume, not just the new or updated data. This simplifies the process of recovering or duplicating the data on a drive, but it's a slower process and requires more storage space for snapshots.
Data deduplication: Data deduplication reduces the volume of data backed up by replacing duplicated files or blocks with pointers to the original instance. Deduplication can speed backup times and cut down on the amount of WAN bandwidth required. It can also reduce the amount of space needed for the backup.
Many backup software products incorporate data deduplication, but it's also incorporated in disk products that can be used as backup targets.
Encryption: Encryption is often incorporated into backup products to protect data at rest or in flight across a local-area network (LAN) or WAN. Symantec's Veritas NetBackup, for example, provides encryption as an option that allows tape or disk data to be encrypted during backup or decrypted during recovery.
There's debate about where in the process encryption should take place. Software encryption works, but has a performance overhead that can affect the duration of the backup window and potentially lock the organisation into the backup software product.
Encryption can also be performed at the target using a dedicated appliance, such as nCipher Plc's CryptoStor family or NetApp's DataFort family. In the disk-to-disk storage space, virtual tape library (VTL) products are embracing encryption, such as with FalconStor Software's Secure Tape Transport Service module for its VirtualTape Library product.
Encryption algorithms require a unique key to encode and decode data, so you'll have to factor in management processes to deal with how the key is managed to prevent loss and minimise the possibility of compromised data.
Tape-based backup systems often include encryption.
Version sensitivity in backup products: Backups are often version-sensitive to the backup product, so you may face a situation where it's difficult or impossible to restore older backups once the software is updated. This can prove troublesome when faced with a legal discovery request that requires a search that dates back across multiple software versions. So before you select a backup platform, make sure you understand how it supports previous and future versions, as well as its interoperability with other vendors' backups.
Recovery: You should consider the recovery process associated with specific products carefully. The best backup is useless if you can't restore it. You need to know what's involved in restoration and who can do it. Restoration from tape, for example, will often require an experienced backup administrator, while disk-based or replication methods may allow users to recover individual files on demand.
Reporting and monitoring tools: Monitoring helps backup administrators understand the efficiency of backups by quantifying backup performance according to factors such as environment and time of backup. Information gained by using reporting tools can help users make improvements that can optimise or streamline the process -- for example, by shifting backups to a different time to make use of lower network usage.
Being able to identify bottlenecks in particular parts of the IT system can point to the need for network infrastructure upgrades or media changes. Reporting tools should also provide comprehensive and configurable information at a variety of levels: high, to provide backup statistics on a weekly or monthly basis; and low, to identify possible bottlenecks or media problems.
Data protection and backup products can also provide alerts that allow specific events or status updates to be sent to IT staff by text or email.
Monitoring and reporting can be implemented via standalone products that are separate from backup software. EMC's Backup Advisor is one example.
A cronis Inc.
BakBone Software Inc.
BridgeHead Software Limited
Double-Take Software Inc.
FalconStor Software Inc.
Mimosa Systems Inc.
SteelEye Technologies Inc.
Yosemite Technologies Inc.
This was first published in January 2009