Nine data deduplication technology implementation considerations

This tip highlights nine crucial factors to be considered while implementing data deduplication technology for your storage environment.

Learn how to Evaluate data deduplication solutions in the first installment of this Expert Advice column.

Data deduplication is a hot selling technology today due to its associated high return on investment (RoI). Almost every storage vendor has launched data deduplication-enabled systems, claiming their deduplication ratios to be higher than industry peers. However, data deduplication rollouts come with their associated challenges. The following aspects should be considered during the implementation of data deduplication technology.

• If you have opted for source-based data deduplication technology, sizing of the dedupe caches on the source servers is very important. Default size of these caches is ideal for most scenarios, but if the server performance is impacted after deploying the data deduplication solution, the dedupe caches should be tuned according to the size of data and applications running on the host. Source-based deduplication vendors have guides to help you resize the caches in case of performance issues. Ideally, database applications should be backed up without dedupe option, as the chances of finding redundant data are minimal and it adds overhead to the performance of the database server.

• To achieve higher dedupe ratios in the case of target based data deduplication technology on VTL backup solutions, set the multiplexing parameter to 1. By setting the multiplexing parameter to 1, we achieve higher dedupe ratios since data from only one client is sent to the backup device at a time, and divided into chunks. We get low dedupe ratios if the data from different clients with different operation systems is chunked together. In such cases, we divide the data in different OS formats simultaneously into chunks—as the OS formats are different, dedupe ratios will be low.

• Storage lifecycle policies should be implemented with dedupe-enabled storage systems so that backups with infinite retentions are moved to tapes for longer retention. This practice ensures that steady state is achieved on the appliance for optimum performance and higher RoI.

• More frequent ‘full-backups,’ as opposed to incremental and differential backups, will lead to a higher deduplication ratio (due to elimination/ dedupe of more redundant data). You should accordingly configure the backup policies.

• In case of a tight backup window, the deduplication process can be scheduled after the backup is complete to achieve high throughput, as data deduplication technology adds to overheads. Deduplication can be scheduled during the production hours, when the backups are not operational. Use inline deduplication technology in other cases, as the required storage will not be equal to the actual amount of backed up data.

• Ideally, data deduplication on primary storage should be enabled only for mission critical applications, as high response times due to possible dedupe calculation overheads. Enabling data deduplication technology on network-attached storage systems can prove beneficial, as there are high chances of redundant data among many users, and performance is not a critical factor. The data dedepulication technology implementation should be scheduled outside the production hours to have minimal impact on performance. 

• File servers, remote servers, user desktops and laptops are the best candidates for deployment of source-based data deduplication technology. In cases where bandwidth is a limitation (as well as for database servers), use target-based deduplication solutions.

• Virtual environments are ideal for implementing data deduplication technology, as multiple operating systems (OS) are run on the same physical server. Hence, there are high chances of redundant OS data, leading to high dedupe ratios. For instance, to back up a physical server hosting 10 Windows 2008 virtual machines and 10 Solaris virtual machines, the data deduplication solution will back up only one OS copy each for Windows and Solaris, as the base OS image will be the same for virtual machine belonging to the OS family. Target-based data deduplication solutions will yield higher dedupe ratios, as data from various physical virtual servers will be sent to a single device, where dedupe will be performed.

• To get higher dedupe ratios, divide backup clients in batches, and start the backups one after another. Also, less data will travel across the local area network (LAN), leading to optimal utilization of your LAN/WAN link. As the clients are backed up in batches, the redundant data will not travel across LAN, but if the backups are triggered at once, the data will travel across LAN to the backup destination and then get deduplicated.

Learn how to Evaluate data deduplication solutions in the first installment of this Expert Advice column.

About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional. Sharma has experience in handling implementation projects related to SAN, NAS and BURA. One of his articles was published globally by EMC, and titled the Best of EMC Networker during last year's EMC World held at Orlando, US.


This was last published in February 2011

Read more on Disaster recovery

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.