Source deduplication can reduce the total amount of stored data, and thus sidestep the need for expensive, high-bandwidth inter-site links to replicate data.
- Consider the type of data: Data that has already been compressed typically does not yield the best data deduplication ratio. Structured files typically contain a good deal of redundant data, and as a result are an excellent candidate for deduplication. Unstructured data consisting of many unique files -- images, for example -- will not achieve such good data reduction ratios.
- Consider the load on clients: Source-based data deduplication incurs a performance penalty on the host because it occurs before the backup takes place. The impact of the processing overhead is determined by the size of the data set (the smaller the data set, the smaller the overhead). Simply put, the larger the group of files that need to be checked for uniqueness, the longer and more intense the processing requirements on the host.
- Restore considerations: Restoring deduplicated data requires 'rehydration' of the data, which incurs an overhead in the restore time. Whilst this overhead is negligible in single file restores, in a situation that involves full site recovery (and hence a large amount of data), this overhead will grow accordingly.
- Cost is always a consideration: While the introduction of data deduplication in this case is driven by the need to reduce the necessary bandwidth (and associated costs) between sites, careful consideration is needed to ensure that the correct technology is chosen to maximise the return on investment. So be certain of the business case for deduplication before proceeding.
For more on data deduplication technology:
1. Learn about data deduplication for primary storage.
2. Find out how to control data growth with target-based data deduplication.
3. Discover the difference between source and target dedupe.
Related Q&A from Allaster Finke
Learn which tools you need to perform SAN troubleshooting, which metrics point to a sign of trouble, and why granular and regular data collection is ... Continue Reading
Learn the characteristics of RAID 10 and RAID 50, as well as which RAID level is appropriate for your applications. I/O and protection needs ... Continue Reading