Data deduping greatly reduces the need for disk capacity, which translates to large percentages of savings. But because of the many ways of carrying out data deduplication, many product forms have become available. Download this free Essential Guide to sorting through data deduping choices to learn about the options you'll have when planning a data deduping project, and to determine how to best implement data deduplication technology in your data centre.
There’s absolutely no doubt that data deduplication can bring enormous savings in backup disk space. By assigning a unique identifier to chunks of data and discarding identical instances that come after them, data deduping can reduce the need for disk capacity by magnitudes of up to 90% or more.
Translate those kinds of capacity savings into cash, and there is a compelling case for applying the power of data deduplication to your backups and potentially even your primary data. After all, spending on disk capacity is the single largest item in a storage manager’s budget (see our 2011 Purchasing Intentions Survey), so saving such large percentages makes dedupe a very attractive option.
But, as they say, there’s no such thing as a free lunch, and the potential variables that can affect the outcome of data deduplication product selection and implementation are many, varied and often interlinked. That’s because in the space of the few short years in which data deduping has gone from breakthrough technology to a plateauing of adoption levels, numerous methods of carrying out data deduplication have arisen in product form. In short, just about every data deduplication vendor does it differently.
Probably the first thing you’ll need to assess is the nature of the backup regime into which you want to fit data deduplication. This is going to have a huge impact on the type of product(s) you buy.
If, for example, you aim to use dedupe as part of a remote office backup strategy or to back up virtual machine image files, this puts you in the market for source deduplication. Source dedupe carries out its work at the source server(s), as the name suggests, which makes it suited to use cases where you want to reduce the data volume before transmitting it over the wire.
By contrast, target deduplication is suited to cases where you are happy to transmit all your data before it is processed by the dedupe engine.
Which of these methods is going to suit you depends on your use case, the network bandwidth you possess and backup window time available.
Similar factors are at play when you come to decide whether to deduplicate data inline or post-process, and here the choices begin to pile on top of one another. You can, for example, deduplicate inline at the source or at the target, and you can deduplicate at the target via inline or post-process methods.
These are just two of the major decision points on the road to purchasing and implementing data deduplication technology. Between these and a final decision, there are—or should be—numerous interlinked questions, including:
- Should you implement software or hardware dedupe?
- What types of data do you want to deduplicate?
- Can you dedupe your primary data?
- Do you need global deduplication, and, if so, how many nodes will you need?
In this Essential Guide, we walk you through the decisions you’ll need to make when starting down the road to data deduping. It’s a complex process, but the rewards can be well worth it.
Antony Adshead is bureau chief of SearchStorage.co.UK.
This was first published in November 2011