Effective backup performance tuning and moving bottlenecks in the backup path in the backup path

Tuning throughput is considered an important part of running a successful backup operation.







Performance tuning for backup operations can help remove—or move—the bottlenecks that cause backup jobs to stumble or falter. A backup process that's slowed down for any reason can overrun its allocated window and spill over into work hours when production access to data is expected. This becomes even more critical when backups have an impact on the customer experience. For instance, if you are backing up a database cold, there is no flexibility, the backup has to finish before the start of the working day.

Tuning backups is a challenge, and for many it is an interesting exercise in problem-solving. For me, a successful performance tuning exercise leading to a single client backing up several terabytes in just a few hours is extremely satisfying. Internally, we occasionally compare who has configured the fastest backup (boys will be boys and I believe the current record stands at 95MB/sec for a single data stream to an LTO-3 tape drive), but what about the 90% of clients in the same environment who are using shared infrastructure and struggling at 5MB/sec?

Backup performance tuning is all about moving the bottleneck from one place to another until you are satisfied things are running at their maximum speed. The bottleneck in a typical network-based backup could be located at the client disk, its host-to-disk connection (SCSI/SAN), the HBAs, system resources, within the network (wire speeds, switches and routers, routing protocols and firewalls), the network interface on the client of the server, server system resources, server HBAs and the backup medium itself (tape, tape drives, optical drives, disks etc). Once a bottleneck has been identified and the underlying problem is resolved, it moves somewhere else.

 So where is the best place for that bottleneck to reside? Historically, I would have suggested that your tape drive is your bottleneck, as tape drives dislike being trickle fed. Some tape drives step their throughputs down by adjusting speeds; others pad the tape with zeros to fill in gaps between data. Generally, if you don't send enough data to a tape it shoe-shines, which will rapidly reduce the mean time between failure of both your tapes and your drives. However, as disk becomes a more popular primary destination, the option to bottleneck with the backup medium diminishes.

With the removal of moving parts that require special consideration, the question of where your bottleneck resides re-arises. Do you continue tweaking the backup performance until you are completely happy or when your budget has been exhausted? There are many products on the market that enhance backup performance. LAN free and clientless backups can eliminate network and client side problems and snapshots/mirror copies can enhance contiguous read performance. Using disk as a primary backup medium can improve resilience as well as performance whilst block based backup software can enhance read speeds; however these all come at a cost.

In reality, the answer is in the question. The data/application owners (the customers) ultimately decide how long a backup takes and therefore decide how long we spend identifying and moving bottlenecks. They define the service level agreements which includes how long a service can be unavailable for in the event of a disaster, the Recovery Time Objective (RTO).

The SLA might also specify a window of opportunity for backups, e.g. after the end of batch processing and before the start of the working day. The RTO doesn't tell us how fast to backup, though a well designed backup solution should be focused on data restoration as much as data backup and fast backups 'generally' mean fast restores (data that takes 5 days to backup will not be restored in just a couple of hours). When the customers have stated their requirements and stumped up the necessary cash, then it's time to put the thinking cap on and problem solve your way to happiness.

About the author: David Boyd is a senior consultant at Glasshouse Technologies (UK), a global provider of IT infrastructure services, with over 7 years experience in backup and storage, with a major focus on designing and implementing backup solutions for blue chip companies.


Read more on Data protection, backup and archiving