Post process de-duplication is faster but requires much more space. It involves taking a disk backup and then de-duplicating the data. This places less strain on the system during the task and doesn't impact backup windows. It also ensures that there is at least one full copy of the last backup on disk, which can aid in data restoration. However, this method also requires significantly more space to be available on the disk or virtual tape library for the process to work. As a result, post process de-duplication lends itself quite well to integrating into existing backup environments, where there is likely to be a mix of tape and disk, as there is minimal impact to the existing environment.
Whichever form of de-duplication an organisation uses, it is not the be all and end all of an efficient backup system. As unstructured data continues to spawn, organisations must also ensure that their systems are able to index, search and retrieve backed up data correctly and efficiently – otherwise de-duplication is simply switching 1,000,000 pieces of random data for 100,000; while the latter is better, it is still far from ideal.
Related Q&A from Chris Reid
Agent-based and agentless replication - what are the pros and cons? Continue Reading