How to buy data deduplication

Learn what to look for in a data deduplication product used for backup and recovery purposes in this tip.

As more organisations implement disk-based data backup and recovery to overcome the performance and reliability shortcomings of tape-based data backup, data deduplication has emerged as a force to improve the economic feasibility of retaining data longer on disk (possibly eliminating tape) or increasing the number of workloads using disk as an interim stop on the way to longer term retention on tape.

Hardware vendors spearheaded dedupe adoption with powerful, purpose-built deduplication appliances that process backup data before or after it's written to disk. Benign to the existing backup environment, this hardware-based approach made deploying dedupe relatively easy. Research from the Enterprise Strategy Group has found that the ability to integrate with existing backup processes and overall ease of use are more important adoption factors to organisations than specific technical considerations, such as a deduplication ratio or the granularity of deduplication.

Seamless integration with existing data protection practices, as well as IT's historic resistance to change when it comes to backup software, meant that backup solution providers that could offer deduplication had a more difficult time getting mindshare in the data center.

When EMC Corp.'s Avamar came to market touting a better, more efficient way to back up data, the company faced an obstacle that was hard to overcome: reluctance to walk away from existing backup applications. IT organisations could clearly understand the benefits, but weren't motivated to initiate a technology change that would have a ripple effect on the operational aspects -- people and process -- of the data protection environment. EMC Avamar has therefore had to take a more circuitous route to the data center, providing a bandwidth- and storage-optimized backup solution for remote and branch offices, as well as an efficient data protection alternative for server virtualisation environments.

However, the integration of acquired deduplication products by EMC (Avamar) and Symantec Corp. (PureDisk) with NetWorker and Veritas NetBackup, respectively, as well as recent introductions of native dedupe by CA, CommVault and IBM. have a lot of IT organisations wondering which is the best implementation of deduplication -- hardware or software? Bottom line: It's not a one-size-fits-all scenario.

Cost, performance, scalability and the deduplication domain are just a few of the considerations when evaluating deduplication in the backup process to determine whether a backup application's built-in dedupe capability or a feature built into a backup storage system will best serve your environment. Here are some other considerations when choosing a data deduplication product:

Cost. Presumably, an investment made in technology that can reduce storage capacity requirements by a factor of 20 will be easily justified. Is there an added fee to enable the feature whether it's a backup app capability or an "add-on" feature in a hardware device? Is an upgrade to a higher version or model required? Even if deduplication is standard in the product (hardware or software), what other cost implications are there for implementing it (e.g., will it require additional network, server or storage resources)?

Data deduplication performance. Deduplication comes in all shapes and sizes as backup workloads have different requirements. Deduplication may be mixed and matched, taking advantage of features of both software and hardware products. Source-side data dedupe in backup software may make the most sense for remote systems because it delivers greater network efficiency, while target-based dedupe approaches may make more sense for workloads with the most stringent data backup windows.

Product scalability. While deduplication should mitigate the need to expand storage capacity, the impact of growth on the dedupe environment should be thought through. You need to determine how easy or difficult it is to expand the deployed product, and if expansion will introduce silos of storage (and thereby limit deduplication) and increase management. And does scaling require a forklift upgrade or can it be achieved more seamlessly?

Deduplication domain. You also need to consider the scope of the deduplication effort. Will your dedupe effort be limited to the confines of a single container -- whether it's logical or physical -- or are your goals broader?

Such a wealth of deduplication options provides ample choices, but it can also lead to some confusion. Vendors have the opportunity to educate users about deduplication technology in general, and specifically how their own solutions approach the task. And you need to understand your backup environment and requirements before short-listing solutions. Vet the vendors and their products, check their references and, most importantly, test the products using your own data over several backup cycles.

Read more on Data quality management and governance