How does data deduplication impact backup?

Ask the Expert

How does data deduplication impact backup?

How does data deduplication impact backup?

Continue Reading This Article

Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
  • By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

  • Safe Harbor

Data deduplication impacts backups in different ways depending on the technology deployed. There are essentially two options:


  • Deduplication at the client side.
  • Deduplication at the back-end disk target.

Both of these options have positive impacts on backup as they result in less data residing in backup storage, which means backup targets occupy less space and use less energy for power and cooling.

Host-side data deduplication can be a savior for remote offices with limited connectivity. Historically, such environments have required expensive discrete local backup solutions to protect data, but with the reduction in data backups that deduplication brings, these can be sent to a central hub site. However, data deduplication at the source does have a processing overhead and, depending on numerous factors (CPU, transaction intensity, etc.), this may cause issues. Because of these scalability limitations, host-side product offerings are not best suited to large operations in the data centre. It is in the data centre that deduplication at the back end comes into its own. A virtual tape library (VTL) with deduplication capabilities offers the benefits associated with a reduced backup data footprint and offers disk performance and reliability in the backup space. These benefits can, depending on the backup data footprint and depth of an organisation's pockets, enable you to completely replace tape, work in conjunction with tape to secure critical data or act as a staging area with fast restore performance for recent backups.

Deduplication within these appliances occurs either in the data path (inline) or as a post-processing activity. The choice of where you dedupe is dependent on a tradeoff between any overhead that comes with inline deduplication and the larger initial repository of data that comes with post-process deduplication.

Depending on your requirements, deduplication at the back end can introduce further challenges. If you want to maintain deduped awareness across multiple appliances, you need to ensure the product supports a global topology. In addition, from a replication and disaster recovery (DR) perspective you will need a second appliance.


This was first published in September 2009


COMMENTS powered by Disqus  //  Commenting policy