Data protection is changing; it used to be backing up your data to tape; it could be as often as once an hour, once a day, once a week or even once a month. To many IT managers, that is still the meaning of data protection.

Times have changed. With increasing regulatory compliance (worldwide and especially escalating at warp speed within the European Union), plus the glut of stored data, which doubles down every 18 months or so, traditional data protection methodologies can no longer adequately protect the data.

This problem has led to the development of a plethora of next generation data protection (NGDP) technologies, including deduplication (aka single instance storage and global single instance storage), virtual tape libraries (VTL), continuous data protection (CDP), continuous snapshot (aka small aperture snapshot) and distributed remote office/branch office (ROBO) backup to disk.

[Note: Many traditional backup product vendors, and some of their customers, may argue vehemently that their products are more than adequate for the challenge. I contend that those that are adequate to meet the challenge have actually developed and deployed NGDP technologies as features of their traditional product.]

Deduplication is one of the more exciting data protection technologies hitting the data protection market. The value comes from the elimination of duplicate stored data. The amount of that value varies by application. If the application creates a lot of duplicate data, such as full backups or full volume snapshots, then the value can be as great as a 99% reduction (25-times compression) in the amount of stored data. This is pretty heady stuff. Now, if the application produces primarily unique data, such as incremental backups or snapshots, the value is significantly reduced to approximately 60% to 80% (three-to-four times compression.)

Deduplication can be deployed as a standalone add-on to a data protection implementation (Data Domain, Diligent and ExaGrid). This greatly enhances the incumbent product and avoids the rip-out-and-replace pain. It can also be deployed as an integral feature of a data protection product (Asigra, Avamar, Data Domain, Diligent, EMC, FalconStor, Sepaton, Symantec).

VTL technology is software that enables disk to emulate a tape drive. The value of this product comes from faster backups to tape (five times), more efficient tape utilization -- 50% increase for Windows, Unix, Linux and a 900% increase for zOS -- and a lot less stored data if deduplication is integrated as part of the solution.

VTL products are available from many vendors, including Copan, Data Domain, Diligent, EMC, FalconStor, Fujitsu/Siemens, HP, IBM, Neartek, Network Appliance, Quantum, Sepaton, Spectra Logic and SUN.

CDP is getting a lot of attention as another hot next-generation data protection technology. CDP has a recovery point objective (RPO) of zero -- that is, the user can restore data to a single moment before the data corruption or failure, resulting in no data loss. Traditional data protection technologies usually range from six to 24 hours, which can be far too long, leading to too much potential data loss for mission critical applications. Most CDP products are capable of exceptionally fast data restoration ranging from a few seconds to several hours, depending on the vendor and product.

The value of CDP is that it provides the highest level of data protection available today. It protects 100% of the data against loss as a result of hardware failure, human error, malware, corruptions or deletions. Most of the CDP vendors also provide data and application consistency as well, meaning database and mail applications are easily recoverable. This is especially important for exchange. CDP is available from Asempra, Asigra, Atempo, CA, CommVault, EMC, FalconStor, FilesX, InMage, IBM, Iron Mountain, Lucid8, Mendocino, Revivio, Sonicwall, Symantec and Tmespring.

Continuous snapshot (aka small aperture snapshot) is very similar to CDP with two primary differences. First, continuous snapshots are not exactly continuous. There is a time gap between snapshots whereas CDP has no time gaps. This time gap ranges from minutes to days and this gap is the period of time in which data can be lost between snapshots. Second, because continuous snapshot data capture has gaps between snapshot captures, most products (Cloverleaf, Exanet, Network Appliance and Symantec/DCT) do not have the same application consistency attributes as CDP. This means they have to make the application operations quiescent (pause) for the snapshot data to be in a recoverable form.

Generally speaking, continuous snapshot provides similar and not quite as high a value as CDP. Distributed ROBO backup to disk is designed from the ground up to protect the data outside the data center (equaling 50%-90% of the organization's data depending on which report you believe). This is a knotty problem that requires data center performance and recoverability with little or no ROBO data protection skills and limited wide area bandwidth.

Distributed ROBO backup to disk solves these problems by utilizing local and global deduplication, wide area network (WAN) optimization, centralized management and control, local and centralized data recovery, file and data versioning, encryption in flight, encryption at rest and even CDP in some cases. The value of distributed ROBO backup to disk is centralized control with local performance and recoverability. Distributed ROBO backup to disk is available from Asigra, Avamar, eVault, Iron Mountain, Signiant and Symantec.

I will go into greater detail in future blogs about each of these NGDP technologies.

About the author: Marc Staimer is president and founder of Dragon Slayer Consulting in Beaverton, Ore. He is widely known as one of the leading storage market analysts in the network storage and storage management industries. His consulting practice of six plus years provides consulting to the end user and vendor communities.

This was last published in November 2006



These types of solutions will be commonplace in the near future. With public cloud on the back-end, the only question that remains is how much compute and storage will remain on premises for whatever reason. Even this case could do all the computing for a very large number of small businesses if they use the cloud on the back end.