Data deduplication impacts backups in different ways depending on the technology deployed. There are essentially two options:
- Deduplication at the client side.
- Deduplication at the back-end disk target.
Both of these options have positive impacts on backup as they result in less data residing in backup storage, which means backup targets occupy less space and use less energy for power and cooling.
Host-side data deduplication can be a savior for remote offices with limited connectivity. Historically, such environments have required expensive discrete local backup solutions to protect data, but with the reduction in data backups that deduplication brings, these can be sent to a central hub site. However, data deduplication at the source does have a processing overhead and, depending on numerous factors (CPU, transaction intensity, etc.), this may cause issues. Because of these scalability limitations, host-side product offerings are not best suited to large operations in the data centre. It is in the data centre that deduplication at the back end comes into its own. A virtual tape library (VTL) with deduplication capabilities offers the benefits associated with a reduced backup data footprint and offers disk performance and reliability in the backup space. These benefits can, depending on the backup data footprint and depth of an organisation's pockets, enable you to completely replace tape, work in conjunction with tape to secure critical data or act as a staging area with fast restore performance for recent backups.
Deduplication within these appliances occurs either in the data path (inline) or as a post-processing activity. The choice of where you dedupe is dependent on a tradeoff between any overhead that comes with inline deduplication and the larger initial repository of data that comes with post-process deduplication.
Depending on your requirements, deduplication at the back end can introduce further challenges. If you want to maintain deduped awareness across multiple appliances, you need to ensure the product supports a global topology. In addition, from a replication and disaster recovery (DR) perspective you will need a second appliance.
This was first published in September 2009