What are the pros and cons of inline and post process data deduplication?

Chris Reid, consultant with Morse, looks at the pros and cons of inline and post-process data deduplication.

What are the pros and cons of inline and post process data deduplication?
The main factor when choosing de-duplication technologies comes down to a simple choice of speed versus space. In-line de-duplication requires the least space because excess data is stripped away as it arrives on the system. With current technology (whether software, hardware or both) this is a slower process, which impacts backup windows. Naturally, as computing technology continues to advance, this may become less of an issue as long as the technology can keep ahead of the growth of information.

Post process de-duplication is faster but requires much more space. It involves taking a disk backup and then de-duplicating the data. This places less strain on the system during the task and doesn't impact backup windows. It also ensures that there is at least one full copy of the last backup on disk, which can aid in data restoration. However, this method also requires significantly more space to be available on the disk or virtual tape library for the process to work. As a result, post process de-duplication lends itself quite well to integrating into existing backup environments, where there is likely to be a mix of tape and disk, as there is minimal impact to the existing environment.

Whichever form of de-duplication an organisation uses, it is not the be all and end all of an efficient backup system. As unstructured data continues to spawn, organisations must also ensure that their systems are able to index, search and retrieve backed up data correctly and efficiently – otherwise de-duplication is simply switching 1,000,000 pieces of random data for 100,000; while the latter is better, it is still far from ideal.


Read more on Data protection, backup and archiving