Data backup copies overview
How do you choose a storage copy technology? The best choices usually involve need and cost, vendor support, and technological compatibility/longevity.
All backups produce copies of valuable data that protect an organization against loss, theft, failure and other unforeseen calamities.
But traditional "backups" are typically relegated to slow tape or optical media where data is not intended for ready access. "Copies" are a bit different, usually duplicating data to nearby disk and allowing quick restoration when the original data is compromised. Today, data backup copies go far beyond simple disk-to-disk file transfers. There's growing specialization and diversity in data copies -- allowing storage administrators to select copy platforms that meet specific requirements for performance and retention.
At the simplest level, all backups produce copies of valuable data that protect an organization against loss, theft, failure and other unforeseen calamities. But traditional "backups" are typically relegated to slow tape or optical media where data is not intended for ready access.
"Copies" are a bit different, usually duplicating data to nearby disk or a remote storage system and allowing quick restoration when the original data is compromised. Today, data backup copies go far beyond simple disk-to-disk file transfers. There's growing specialization and diversity in data copies -- allowing storage administrators to select copy platforms that meet specific requirements for performance and retention within the enterprise.
Mirroring and replication
These two storage technologies are certainly the purest and most straightforward types of data copies. Mirroring, as the name implies, protects the data on one disk by copying the contents verbatim to another disk -- usually in the same storage system. The most common example of mirroring is a storage array supporting RAID 1, or better. Mirroring is typically a dynamic activity that keeps the contents of both disks continuously in synchronization. If one disk fails, the mirrored disk takes over on the fly to provide access to data. Ideally, a data user never even knows that a disk problem has occurred. Once the failed disk is exchanged, it's reconstructed directly from the mirrored disk.
Replication is very similar to mirroring in that an exact copy of data is created. In fact, the two terms are often used interchangeably. However, replication is normally a "box-to-box" copy process rather than disk-to-disk. For example, replication software can image a storage system to another server or a dedicated replication appliance from vendors like Network Appliance Inc. (NetApp). The "box-to-box" approach to copying data allows a physical separation in storage systems. A replication system may be located outside of a data center elsewhere on a corporate campus or even located in another geographic region -- allowing for a measure of disaster planning.
Snapshots and CDP
Recovery time is certainly important because lost data must be restored in an acceptable period of time. But storage administrators are increasingly worried about the recovery point -- how current is your backup? Skilled IT personnel may be able to restore hundreds of gigabytes from a backup in a matter of hours, but if the backup is one week old, you'll still need to re-create up to a week's worth of missing or changed data not protected by the backup. This can be costly in terms of lost sales and lost productivity, and lost information may be impossible to re-create.
One way to shrink a recovery point is to track changes to the storage environment, periodically recording any data changes that take place. A storage snapshot basically makes a complete "point-in-time" copy of a SAN drive set. Since a SAN typically mirrors its data through RAID anyway, a snapshot effectively creates a third data copy that is kept apart from the main production copies. This is often called split mirror, though some vendors, like EMC Corp., refer to this as a business continuance volume (BCV). A split mirror snapshot is periodically synchronized with the SAN, and can sometimes be accomplished in just a few seconds. Since snapshots can be accomplished much faster than traditional tape backups, a snapshot can be taken several times per day -- even several times per hour -- dramatically shortening an organization's recovery point. NetApp and EMC are two vendors touting snapshot technology in storage products.
Some organizations need extremely granular restore points that even frequent snapshots cannot adequately support. Busy sales organizations are one example of this, where any data lost since the last backup would simply be unacceptable. Continuous data protection (CDP) fills this need, tracking every storage transaction to maintain a continuous image of the storage environment on disk. In effect, CDP creates a snapshot for each moment in time where a data change occurs. This produces an extremely detailed journal of changes that an administrator can use to correct data loss or corruption. Files can be recovered from system states that are days past or just moments old. CDP products are available from vendors like Mendocino Software, Revivio Inc., Asempra Technologies, TimeSpring Software Corp. and others [see the SearchStorage.com article on CDP].
Storage All-In-One Guides
Learn more about storage topics like disk storage, disaster recovery, NAS, and more in SearchStorage.com's All-in-One Research Guides.
Vendor influence and competition has split the CDP market into two categories; "true CDP" and "near-CDP". True CDP products track every single storage change or I/O as described above -- the definition largely supported by SNIA. By contrast, near CDP is basically a practice of frequent snapshots; perhaps several times each hour. This is best represented in products like Microsoft's Data Protection Manager (DPM).
CAS and VTL
A burgeoning amount of corporate data must be retained for long periods with only infrequent or occasional access. But when the data is called for, it must be produced very quickly. Some typical examples are bank statements and check images, or medical records and patient images. Conventional Fibre Channel SANs are too costly for such storage, and its high performance is wasted, while tape storage is simply too slow when the data is actually needed. Content addressed storage (CAS) meets this need, retaining data on low-cost, high-volume SAS or SATA disks.
Unlike basic replication systems, however, CAS platforms optimize the use of disk space through data deduplication and compression technologies. For example, a corporate data center may store a dozen copies of the same sales presentation. If the data center were backed up to CAS, data deduplication would only save one copy of the file -- sometimes reducing disk space requirements as much as 50-to-1. CAS also includes security features that manage retention and prevent data from being changed or tampered with. Encryption is often added to curtail unauthorized access [see the SearchStorage.com article on CAS].
Disk offers much better performance than tape, but companies with substantial investments in backup software and tape library hardware are often hesitant to make the move to disk-based copies. A virtual tape library (VTL) backs up data to a disk-based storage system that is specifically designed to emulate a tape library in the storage infrastructure. This type of technology allows storage administrators to reap better backup and restoration times, while preserving their current investments in backup software, backup procedures and in-house expertise.
VTL is not generally intended to obsolete tape. In many cases, VTL products include tape system support. Once a backup to VTL disk is finished, the VTL system itself can then produce a backup tape for off-site or long-term archival storage – sometimes referred to as disk-to-disk-to-tape (D2D2T). This interim step eliminates much of the downtime needed for conventional backups directly to tape [see the SearchStorage.com article on VTL].
Making a purchasing decision
So how do you choose a storage copy technology? In truth, there are business and technical issues to consider -- far too many to detail here. But the best choices usually draw from three areas; need and cost, vendor support, and technological compatibility/longevity.
First, you'll need to match business objectives to the particular storage technology. That is, "what do you need to accomplish?" For example, any storage array will protect your data through RAID mirroring, but data that must be guarded against flood or other disaster may benefit more from off-site replication. If the need is to reduce a recovery time objective from 12 hours to 4 hours, a snapshot technology may be worth considering. The idea is to decide what storage goals are needed, and then examine the technology that will best address those goals. Once a suitable technology is identified, its total cost of ownership and return on investment must make financial sense. An ideal technology may be impractical if it costs too much to acquire, implement or maintain.
Once a technology is identified, it's important to find a vendor that can provide the necessary products and support. This is harder than it sounds. Startup companies often have the superior product technologies or features, but are usually hard-pressed to offer service to large or geographically diverse organizations. Many large companies avoid startups for this reason – often waiting for startups to be acquired or OEM their products to larger vendors.
Finally, what does it take to make a storage product work? Prior to any purchase commitment, a prospective product should be brought in-house where its performance, compatibility and maintenance needs can be fully evaluated. Compatibility with the existing storage infrastructure is critically important -- no storage administrator wants to rip out working infrastructure to accommodate a new storage system. Testing is also an ideal time to determine the amount of staff time needed to support a product. A product may work perfectly but simply take too much IT resources to be an effective acquisition.