Understanding continuous data protection
Simply put, CDP is a backup technology that records write transactions to disk in real time. Writes are saved to a journal file along with corresponding file changes. If data loss occurs from human error, virus corruption or disk failure, a CDP platform can restore any number of protected files to any moment in time delineated in the transaction journal. The process is usually termed true-CDP and it offers unparalleled granularity for data recovery. Restoration is quick because the content is recovered directly from disk.
Unfortunately, not all vendors in the CDP market define CDP in this way. For some storage vendors, CDP is basically a frequent series of snapshots to disk -- an approach called near-CDP. Although this is a perfectly sound technical approach, users must understand that RPOs available with near-CDP products will be larger than the RPOs possible with true-CDP products. For example, Microsoft's Data Protection Manager (DPM) is touted as CDP even though it takes hourly snapshots, resulting in an RPO of one hour.
CDP hardware and software
CDP products can be implemented as either software or hardware. Software-based CDP is more common and includes tools like BackupExec 10d for Windows Servers from Symantec Corp. Software is installed and configured on a host server and uses storage resources currently available in the data centre. "In many cases, you are able to utilize existing storage, where you're simply carving off a new RAID group, a new LUN [logical unit number] and then pointing the data towards that existing storage," says Phil Goodwin, president of Diogenes Analytical Laboratories.
You can also opt for a hardware-based platform already configured with corresponding software. A system's vendor or integrator might integrate software onto a host for you. "The Microsoft solution [DPM] in many cases is packaged with a system like Hewlett-Packard or Dell," Goodwin says. "So they would supply you with the RAID box and the server, etc." Alternately, a vendor might offer a purpose-built appliance using in-house CDP software, such as the continuous protection system CPS-1200 and application integration suite from Revivio Inc.
CDP products can be deployed in-band or out-of-band. In-band deployments exist in the data path between application servers and the CDP platform. This often allows for better coverage for more servers simultaneously, but the insertion of a CDP system can become a performance bottleneck. Out-of-band CDP products reside outside of the data path and will not impede performance. However, agent software must typically be installed on each server being protected - usually resulting in higher maintenance overhead for IT staff. Both approaches are equally acceptable. It's just a matter of matching your IT needs to the most compatible product.
CDP disadvantages and limitations
While CDP might theoretically replace a snapshot system, a tape device or another backup element, analysts strongly discourage that practice. Consequently, CDP doesn't replace anything, so there is additional capital outlay for the platform, along with the human capital needed to operate and maintain it. "If you are in a scenario where you're already 'maxed-out' on human bandwidth, adding one more thing to it probably won't help much," Goodwin says.
Some analysts point out that the disk I/O-centric nature of CDP may not capture memory contents without special precautions -- potentially resulting in invalid or incomplete data during a restore. "The CDP product may need to be tied with another mechanism," says Greg Schulz, founder and senior analyst at the Storage I/O Group. "For example, tell Exchange to flush its buffers so that you can capture them off disk."
Any traffic bottleneck or network disruption can seriously impair CDP operation. Some vendors address this concern by resynchronizing once the network disruption passes, but data corruption is a serious possibility. Goodwin notes the adverse effects on Microsoft's DPM. "If you don't complete that [near-CDP] cycle inside the hour you've specified, and the next job gets kicked off, it actually takes a considerable amount of administrator intervention to get that [CDP] system back to a consistent state," he says. "We did run into that in our testing."
Analysts note that CDP journaling demands anywhere from 5% to 40% additional storage overhead, so protecting 100 terabytes (TB) of data may require another 5 TB to 40 TB of disk space. Backups are usually not required for CDP journals because CDP is implemented as part of a broader data protection scheme; so all of the data that CDP is protecting is periodically backed up to snapshots, virtual tape libraries (VTL), tapes or replicated anyway. And once a primary backup is successfully accomplished, the CDP journal can be deleted and started again from scratch. The only time when a CDP journal should be backed up is in rare cases where CDP is deployed as your principle means of data protection.
CDP vendors and product selection
Revivio is one of the most notable true-CDP hardware vendors, offering its continuous protection system CPS-1200 appliance, intended to eliminate backup windows and provide extremely short RTOs. The CPS-1200 runs under Revivio's agentless Application Integration Suite (AIS) software which includes modules for specific applications, like Oracle, Sybase and MS-SQL databases, Microsoft Exchange and Lotus Notes. AIS also integrates with all popular backup software, such as Symantec's NetBackup and EMC Corp.'s NetWorker.
The Business Continuity Server (BCS) from Asempra Technologies is another CDP hardware appliance intended to protect a variety of hosts. Unlike Revivio's offering, however, Asempra's BCS installs agents on each protected host that capture transaction events and send the updated application data to storage. Agents are another software element that will have to be restored or updated by IT personnel.
This was first published in August 2006