CDP overview

Storage volumes are growing at an alarming pace, and tape-based backups are not always possible within an available backup window. Today's businesses are increasingly focused on tighter RTO and RPO. Faster restorations and more recent recovery points that minimize the threat of lost data. Disk storage technologies are systematically replacing tape, spawning specialized storage systems that bring better performance and reliability to the backup process. One emerging data protection technology that has attracted significant attention is CDP.

Tape remains popular for long-term archival storage, but tape technology is hard-pressed to meet the changing needs of business.

Storage volumes are growing at an alarming pace, and tape-based backups are not always practical (or even possible) within an available backup window. Large backup users routinely cite cases where tape backups run beyond a weekend, actually running into the workweek and impacting the production network. Today's businesses are increasingly focused on tighter recovery time objective (RTO) and recovery point objective (RPO). Faster restorations and more timely recovery points can minimise the threat of lost data. Disk storage technologies are systematically replacing tape, spawning specialised storage systems that bring better performance and reliability to the backup process. One emerging data protection technology that has attracted significant attention is continuous data protection (CDP) [see the Tech Closeup on CDP here].

Understanding CDP

Unlike a conventional backup that seeks to maintain an outright copy of your data, CDP works by tracking changes to your data -- often right to the individual read/write event. By recording each change into a digital journal on disk, a storage administrator can literally "rewind" the server or storage array (or other storage system protected by CDP) to a previous point in time; from a few seconds previous to days earlier. Technicians can easily leverage this granularity to recover from a myriad of problems such as lost files, virus damage or data corruption due to network or server faults. Some CDP products annotate a timeline of activity with actual events, helping administrators to identify potentially useful restoration points.

CDP can be implemented as hardware or software. Software-based CDP is typically implemented through an agent running on each server you're protecting (e.g. a database server). Hardware-based CDP appliances are also available for use in-band (in the data path) as well as out-of-band (outside the data path). Hardware can often eliminate the need for agents, though some software may still be required for out-of-band applications. It's best to discuss implementation requirements with your CDP vendor before making a purchasing decision.

Storage All-In-One Guides
Learn more about storage topics like disk storage, disaster recovery, NAS, and more in's All-in-One Research Guides.
Although CDP can support extremely granular restorations, it does not protect against changes or transactions that occur between a fault and its discovery. For example, CDP cannot prevent a virus from infecting a file, but it can restore the infected file to its preinfected state -- the problem is that any work done to the file since the restored point is lost. Some amount of data recreation may be necessary depending on the fault and its impact on your data. Consequently, CDP does not alleviate the need for regular backups.

It is also important to note that CDP is similar to snapshots in some respects, but the two approaches are different. A snapshot basically captures a system's state at a particular point in time, much like CDP. The difference is that snapshots are treated as an event, being taken perhaps once a day, twice a day or maybe even once an hour. When a fault occurs, any data generated between the last snapshot and the fault can be lost just as with any conventional backup. By contrast, CDP is approached more as an ongoing process, recording all activity in real time and allowing restoration back to a precise point just preceding the fault. Snapshots are often visualised as just one "slice" in a CDP timeline.

CDP products

There are several key vendors in the CDP marketplace, including Revivio, Storactive  (recently acquired by Atempo), Asempra Technologies, Mendocino Software, TimeSpring Software, Topio  and XOsoft. Each vendor brings its own unique emphasis to the technology and its use in the enterprise.

Companies like Mendocino are touting the idea of manageability, using event annotation to improve utilisation of the CDP timeline. Rather than selecting a restore point based simply on a timestamp, an administrator can select a restore point that corresponds to a more significant system event. Mendocino calls this "event-addressable storage." TimeSpring follows the idea of manageability through "offline replication," allowing offline testing and inspection of data without impacting the production network. This approach also allows protected data to be used for other purposes beyond backups, including business intelligence or lab testing.

Storage Learning On-The-Go
Download this overview and listen on your iPod or laptop.
Numerous vendors are applying CDP technology to specific applications. For example, Storactive has geared its LiveBackup software product for Windows-based tasks like backup/recovery and disaster recovery. Storactive's LiveServ software protects Exchange servers for e-mail backup and recovery. TimeSpring's TimeData software is available in several versions supporting SQL, NT file system and Exchange environments. XOsoft provides WANSync software designed to offer CDP features between remote offices.

Companies like TimeSpring, Storactive, Mendocino and XOsoft implement CDP as software, but Revivio favors hardware implementations in products like its CPS 1200 or CPS 1200i. The CPS is a nondisruptive, block-based, out-of-band appliance intended to protect mission-critical enterprise applications without impairing application performance.

Applications of CDP

The applications of CDP are as varied as the vendors' products. Some users employ CDP technology to avoid the time and trouble normally associated with traditional backups that frequently ran overnight into the workday, through a weekend into Monday or even failed some way through the process, forcing administrators to take more backup time -- or forego a backup entirely.

A substantial volume of valuable corporate data is often left on laptops or remote locations that aren't routinely protected by any backup strategy at all, so some CDP users focus on supporting remote users. In most cases, CDP software can protect laptops and remote systems across relatively slow WAN links. When a laptop user experiences a lost or corrupt file, the system can be restored from a CDP platform in the corporate data center. System administrators often find the biggest problem is getting remote users to regularly employ the CDP capability.

CDP is also employed to protect specific applications, such as the corporate Microsoft Exchange server or a database like Oracle or SQL. For example, CDP allows lost or deleted e-mails to be recovered far more efficiently than systematically searching through tape backups. An administrator can simply look back through the CDP logs, find the deleted/lost message and restore the e-mail directly from disk.

Finally, CDP is not appropriate for every enterprise. The technology isn't terribly difficult to use, but it is expensive, and the CDP paradigm requires a fundamental rethinking of data protection. As a result, CDP is best suited for organisations that seek a negligible backup window and RPO; enterprises with busy transactional network traffic are often the best fit. Businesses that do not require those benefits may find better value in other disk-based backup technologies like virtual tape libraries or snapshots.

Read more on Data centre hardware