Backup and
recovery windows are shrinking, and storage
professionals realize that today's disk-to-disk (D2D)
technologies won't fill every possible data protection role.
Snapshots and
replication techniques go a long way toward
safeguarding data, but
continuous data protection (CDP) can
preserve data when recovery point objectives (RPO) and recovery
time objectives (RTO) are razor thin -- and data loss simply
isn't an option. However, CDP is hardly a ubiquitous technology.
Innovative startups are being devoured by industry behemoths;
each wrestling to define the role and scope of CDP in the data
centre. This article highlights the factors and considerations
driving CDP technology, examines the leading vendors and their
products available now and covers some practical purchase and
implementation advice.
Understanding continuous data protection
Simply put, CDP is a backup technology that
records write transactions to disk in real
time. Writes are saved to a journal file along with
corresponding file changes. If data loss occurs from human
error, virus corruption or disk failure, a CDP platform can
restore any number of protected files to any moment in time
delineated in the transaction journal. The process is usually
termed true-CDP and it offers unparalleled granularity for data
recovery. Restoration is quick because the content is recovered
directly from disk.
Unfortunately, not all vendors in the CDP market define CDP in
this way. For some storage vendors, CDP is basically a frequent
series of snapshots to disk -- an approach called near-CDP.
Although this is a perfectly sound technical approach, users must
understand that RPOs available with near-CDP products will be
larger than the RPOs possible with true-CDP products. For example,
Microsoft's Data Protection Manager (DPM) is touted as CDP even
though it takes hourly snapshots, resulting in an RPO of one
hour.
CDP hardware and software
CDP products can be implemented as either software or hardware.
Software-based CDP is more common and includes tools like
BackupExec 10d for Windows Servers from Symantec Corp. Software is
installed and configured on a host server and uses storage
resources currently available in the data centre. "In many cases,
you are able to utilize existing storage, where you're simply
carving off a new
RAID group, a new
LUN [logical unit number] and then pointing
the data towards that existing storage," says Phil Goodwin,
president of Diogenes Analytical Laboratories.
You can also opt for a hardware-based platform already
configured with corresponding software. A system's vendor or
integrator might integrate software onto a host for you. "The
Microsoft solution [DPM] in many cases is packaged with a system
like Hewlett-Packard or Dell," Goodwin says. "So they would supply
you with the RAID box and the server, etc." Alternately, a vendor
might offer a purpose-built appliance using in-house CDP software,
such as the continuous protection system CPS-1200 and application
integration suite from Revivio Inc.
CDP products can be deployed in-band or out-of-band. In-band
deployments exist in the data path between application servers and
the CDP platform. This often allows for better coverage for more
servers simultaneously, but the insertion of a CDP system can
become a performance bottleneck. Out-of-band CDP products reside
outside of the data path and will not impede performance. However,
agent software must typically be installed on each server being
protected - usually resulting in higher maintenance overhead for IT
staff. Both approaches are equally acceptable. It's just a matter
of matching your IT needs to the most compatible product.
CDP disadvantages and limitations
While CDP might theoretically replace a snapshot system, a tape
device or another backup element, analysts strongly discourage that
practice. Consequently, CDP doesn't replace anything, so there is
additional capital outlay for the platform, along with the human
capital needed to operate and maintain it. "If you are in a
scenario where you're already 'maxed-out' on human bandwidth,
adding one more thing to it probably won't help much," Goodwin
says.
Some analysts point out that the disk I/O-centric nature of CDP
may not capture memory contents without special precautions --
potentially resulting in invalid or incomplete data during a
restore. "The CDP product may need to be tied with another
mechanism," says Greg Schulz, founder and senior analyst at the
Storage I/O Group. "For example, tell Exchange to flush its buffers
so that you can capture them off disk."
Any traffic bottleneck or network disruption can seriously
impair CDP operation. Some vendors address this concern by
resynchronizing once the network disruption passes, but data
corruption is a serious possibility. Goodwin notes the adverse
effects on Microsoft's DPM. "If you don't complete that [near-CDP]
cycle inside the hour you've specified, and the next job gets
kicked off, it actually takes a considerable amount of
administrator intervention to get that [CDP] system back to a
consistent state," he says. "We did run into that in our
testing."
Analysts note that CDP journaling demands anywhere from 5% to
40% additional storage overhead, so protecting 100 terabytes (TB)
of data may require another 5 TB to 40 TB of disk space. Backups
are usually not required for CDP journals because CDP is
implemented as part of a broader data protection scheme; so all of
the data that CDP is protecting is periodically backed up to
snapshots, virtual tape libraries (VTL), tapes or replicated
anyway. And once a primary backup is successfully accomplished, the
CDP journal can be deleted and started again from scratch. The only
time when a CDP journal should be backed up is in rare cases where
CDP is deployed as your principle means of data protection.
CDP vendors and product selection
Revivio is one of the most notable true-CDP hardware vendors,
offering its continuous protection system CPS-1200 appliance,
intended to eliminate backup windows and provide extremely short
RTOs. The CPS-1200 runs under Revivio's agentless Application
Integration Suite (AIS) software which includes modules for
specific applications, like Oracle, Sybase and MS-SQL databases,
Microsoft Exchange and Lotus Notes. AIS also integrates with all
popular backup software, such as Symantec's NetBackup and EMC
Corp.'s NetWorker.
The Business Continuity Server (BCS) from Asempra Technologies
is another CDP hardware appliance intended to protect a variety of
hosts. Unlike Revivio's offering, however, Asempra's BCS installs
agents on each protected host that capture transaction events and
send the updated application data to storage. Agents are another
software element that will have to be restored or updated by IT
personnel.
More on vendors, products, and guidelines on
page 2