At the simplest level, all
backups produce copies of valuable data that protect an
organization against loss, theft, failure and other unforeseen
calamities. But traditional "backups" are typically relegated to
slow tape or optical media where data is not intended for ready
access. "Copies" are a bit different, usually duplicating data to
nearby
disk or a remote storage system and allowing quick restoration
when the original data is compromised. Today, data backup copies go
far beyond simple disk-to-disk file transfers. There's growing
specialization and diversity in data copies -- allowing storage
administrators to select copy platforms that meet specific
requirements for performance and retention within the enterprise.
Mirroring and replication
These two storage technologies are certainly the purest and most
straightforward types of data copies.
Mirroring, as the name implies, protects the data on one disk
by copying the contents verbatim to another disk -- usually in the
same storage system. The most common example of mirroring is a
storage array supporting
RAID 1, or better. Mirroring is typically a dynamic activity
that keeps the contents of both disks continuously in
synchronization. If one disk fails, the mirrored disk takes over on
the fly to provide access to data. Ideally, a data user never even
knows that a disk problem has occurred. Once the failed disk is
exchanged, it's reconstructed directly from the mirrored disk.
Replication is very similar to mirroring in that an exact copy
of data is created. In fact, the two terms are often used
interchangeably. However, replication is normally a "box-to-box"
copy process rather than disk-to-disk. For example, replication
software can image a storage system to another server or a
dedicated replication appliance from vendors like Network Appliance
Inc. (NetApp). The "box-to-box" approach to copying data allows a
physical separation in storage systems. A replication system may be
located outside of a data center elsewhere on a corporate campus or
even located in another geographic region -- allowing for a measure
of disaster planning.
Snapshots and CDP
Recovery time is certainly important because lost data must be
restored in an acceptable period of time. But storage
administrators are increasingly worried about the recovery point --
how current is your backup? Skilled IT personnel may be able to
restore hundreds of gigabytes from a backup in a matter of hours,
but if the backup is one week old, you'll still need to re-create
up to a week's worth of missing or changed data not protected by
the backup. This can be costly in terms of lost sales and lost
productivity, and lost information may be impossible to
re-create.
 |
| Storage All-In-One Guides | | Learn more about storage topics like disk storage,
disaster recovery, NAS, and more in SearchStorage.com's
All-in-One Research
Guides. |
|
|  |
 |
One way to shrink a recovery point is to track changes to the
storage environment, periodically recording any data changes that
take place. A
storage snapshot basically makes a complete "point-in-time"
copy of a SAN drive set. Since a SAN typically mirrors its data
through RAID anyway, a snapshot effectively creates a third data
copy that is kept apart from the main production copies. This is
often called split mirror, though some vendors, like EMC Corp.,
refer to this as a business continuance volume (BCV). A split
mirror snapshot is periodically synchronized with the SAN, and can
sometimes be accomplished in just a few seconds. Since snapshots
can be accomplished much faster than traditional tape backups, a
snapshot can be taken several times per day -- even several times
per hour -- dramatically shortening an organization's recovery
point. NetApp and EMC are two vendors touting snapshot technology
in storage products.
Some organizations need extremely granular restore points that
even frequent snapshots cannot adequately support. Busy sales
organizations are one example of this, where any data lost since
the last backup would simply be unacceptable.
Continuous data protection (CDP) fills this need, tracking
every storage transaction to maintain a continuous image of the
storage environment on disk. In effect, CDP creates a snapshot for
each moment in time where a data change occurs. This produces an
extremely detailed journal of changes that an administrator can use
to correct data loss or corruption. Files can be recovered from
system states that are days past or just moments old. CDP products
are available from vendors like Mendocino Software, Revivio Inc.,
Asempra Technologies, TimeSpring Software Corp. and others
[see the SearchStorage.com article on CDP].
Vendor influence and competition has split the CDP market into
two categories; "true CDP" and "near-CDP". True CDP products track
every single storage change or I/O as described above -- the
definition largely supported by SNIA. By contrast, near CDP is
basically a practice of frequent snapshots; perhaps several times
each hour. This is best represented in products like Microsoft's
Data Protection Manager (DPM).
CAS and VTL
A burgeoning amount of corporate data must be retained for long
periods with only infrequent or occasional access. But when the
data is called for, it must be produced very quickly. Some typical
examples are bank statements and check images, or medical records
and patient images. Conventional Fibre Channel
SANs are too costly for such storage, and its high performance
is wasted, while tape storage is simply too slow when the data is
actually needed.
Content addressed storage (CAS) meets this need, retaining data
on low-cost, high-volume
SAS or
SATA disks.
Unlike basic replication systems, however, CAS platforms
optimize the use of disk space through data deduplication and
compression technologies. For example, a corporate data center may
store a dozen copies of the same sales presentation. If the data
center were backed up to CAS, data deduplication would only save
one copy of the file -- sometimes reducing disk space requirements
as much as 50-to-1. CAS also includes security features that manage
retention and prevent data from being changed or tampered with.
Encryption is often added to curtail unauthorized access
[see the SearchStorage.com article on CAS].
@22138 Disk offers much better performance than tape, but
companies with substantial investments in backup software and tape
library hardware are often hesitant to make the move to disk-based
copies. A
virtual tape library (VTL) backs up data to a disk-based
storage system that is specifically designed to emulate a tape
library in the storage infrastructure. This type of technology
allows storage administrators to reap better backup and restoration
times, while preserving their current investments in backup
software, backup procedures and in-house expertise.
VTL is not generally intended to obsolete tape. In many cases,
VTL products include tape system support. Once a backup to VTL disk
is finished, the VTL system itself can then produce a backup tape
for off-site or long-term archival storage – sometimes referred to
as disk-to-disk-to-tape (D2D2T). This interim step eliminates much
of the downtime needed for conventional backups directly to tape
[see the SearchStorage.com article on VTL].
Making a purchasing decision
So how do you choose a storage copy technology? In truth, there
are business and technical issues to consider -- far too many to
detail here. But the best choices usually draw from three areas;
need and cost, vendor support, and technological
compatibility/longevity.
First, you'll need to match business objectives to the
particular storage technology. That is, "what do you need to
accomplish?" For example, any storage array will protect your data
through RAID mirroring, but data that must be guarded against flood
or other disaster may benefit more from off-site replication. If
the need is to reduce a recovery time objective from 12 hours to 4
hours, a snapshot technology may be worth considering. The idea is
to decide what storage goals are needed, and then examine the
technology that will best address those goals. Once a suitable
technology is identified, its total cost of ownership and return on
investment must make financial sense. An ideal technology may be
impractical if it costs too much to acquire, implement or
maintain.
Once a technology is identified, it's important to find a vendor
that can provide the necessary products and support. This is harder
than it sounds. Startup companies often have the superior product
technologies or features, but are usually hard-pressed to offer
service to large or geographically diverse organizations. Many
large companies avoid startups for this reason – often waiting for
startups to be acquired or OEM their products to larger
vendors.
Finally, what does it take to make a storage product work? Prior
to any purchase commitment, a prospective product should be brought
in-house where its performance, compatibility and maintenance needs
can be fully evaluated. Compatibility with the existing storage
infrastructure is critically important -- no storage administrator
wants to rip out working infrastructure to accommodate a new
storage system. Testing is also an ideal time to determine the
amount of staff time needed to support a product. A product may
work perfectly but simply take too much IT resources to be an
effective acquisition.