IBM's Tivoli Storage Manager (TSM) has been a popular data backup software product with many enterprise-class IT departments for more than 15 years. But its flexibility in terms of configuration can also complicate things for inexperienced backup administrators.
There are many aspects to consider when deploying a Tivoli Storage Manager (TSM) backup environment. Beyond the basic requirements common to most backup products such as recovery requirements (RTO and RPO), network bandwidth, capacity, etc., some requirements are TSM specific due to its architecture. The following are generally accepted best practices regarding some infrastructure design and implementation items specific to TSM.
1. Disk pool sizing
Tivoli Storage Manager uses disk storage pools to stage backup data and then later migrate it to tape. Among other things, this allows for multiple concurrent backups that are not dependent on the number of tape devices available. The disk pool(s) must be large enough to hold at least the equivalent of one night's worth of backup data that is directed to disk. Otherwise, data migration to tape might be triggered automatically instead of on schedule and interfere with other processes.
2. Data deduplication on disk storage
Although data deduplication works with disk storage, using the technology as a staging disk target with TSM is not a good idea. The best deduplication ratios are achieved over time, so it's better suited when used in conjunction with longer term storage. The transient nature of data that is staged to disk before being migrated to tape makes it a poor candidate for deduplication. Using deduplication-capable storage will yield better data reduction when used in a storage pool configured as the last data migration destination in a similar fashion as tape storage.
3. Tape subsystem sizing and TSM
The tape subsystem must be sized to hold all the backup data to be kept onsite. Lack of capacity will force the ejection of full tapes to make room for empty (scratch) tapes; this practice interferes with TSM's automated processes and usually leads to media management nightmares. When sizing a library you must consider the amount of backup data to be kept onsite, capacity for scratch tapes and future growth. Clearly defining the backup data retention policy prior to sizing the tape subsystem for capacity is a must. For example, if a 300 GB system is to be the object of a daily full backup retained for 15 days, this system alone can use up to 15 times its own size (up to 4.5 TB) in tape library storage capacity. In this context, sizing a tape library should definitely not be an afterthought.
The tape library should also be configured with enough tape devices to accommodate all direct backups to tape, data migration, storage pool backups for offsite, tape reclamation and restore requests for any given daily cycle. This is where technologies like virtual tape libraries (VTLs), combined with data deduplication, can come to the rescue by helping address both capacity issues and peak demand for tape devices.
4. Tape storage pool collocation
Collocation is TSM's ability to store data from individual systems on separate tapes (the opposite of multiplexing), which means data belonging to one system is not mixed with other systems data across numerous tapes. However, collocation can result in poor media utilization when backing up smaller systems thus seriously impacting library capacity. Tivoli Storage Manager now offers a new feature known as collocation groups, which essentially is a compromise between no collocation and system-level collocation by enabling collocation by user-defined groups of systems. However, while collocation significantly reduces backup data fragmentation, it can seriously affect tape media utilization and must therefore be used wisely.
System-level collocation for small systems that do not have enough data to ever fill high-capacity tape media is a poor practice. Picture a 10 GB Web server backing up to its own 800 GB LTO-4 tape. This tape volume would occupy a library slot without ever reaching a significant utilization percentage. It should be noted that replacing tape storage with VTL or disk deduplication technology can significantly reduce the negative impact of both backup data fragmentation and small systems collocation on tape pools.
5. Backup data change rate and retention
The rate at which data changes and how long backup data is retained are the most important factors to consider for TSM capacity planning. Data change rate has a direct impact on the volume of daily backup data which, in turn, dictates network bandwidth and the overall performance requirement of the backup infrastructure. However, the number of backup copies kept (versions) and how long they are retained has a direct influence on the backup data storage capacity (disk, tape or VTL). Versioning and retention should be defined based on business recovery requirements rather than convenience and "nice to have." Other than retention parameters imposed by regulatory compliance requirements, best practice is to define modest retention settings at first and increase as necessary rather than starting large without ever knowing if it is too much.
6. Policy domains and management classes in Tivoli Storage Manager
This is definitely an area where things can become complex and difficult to manage. At a high level, Tivoli Storage Manager manages systems, backup schedules, storage destination and backup data retention based on logical groupings known as policy domains and management classes. Unless there are specific business reasons to treat backup data differently in terms of where it is stored and how long it is to be retained, it is preferable to keep the number of policy domains, management classes and backup schedules to a minimum for simplicity. Too many policies make the environment overly complex, difficult to manage and error prone.
7. Client option sets
The TSM backup clients can be configured to take advantage of numerous settings and options that reflect site specific backup policies or apply to specific systems. While the TSM clients all depend on a local options file (dsm.opt) for basic settings such as the TSM server IP address, it is a good practice to create client option sets on the TSM server for system or group specific configuration options. In large environments where there are a large number of TSM nodes (clients), it is a lot easier to centrally manage client option sets rather than many individual options files spread out across the environment.
8. TSM database and logs
The TSM database must be backed up daily; ideally, it should be backed up twice a day with one copy sent offsite and the other kept onsite for rapid restores. If roll-forward logging mode is enabled, there must always be scratch tapes available for database backups to avoid having the logs reach the 13 GB limit, which will force TSM to a hard stop until the logs are cleared by a database backup.
9. Storage pool backups
This is never stressed enough; all primary disk and tape storage pools must be backed up to the copy storage pool on a daily basis. If a VTL technology is used, data must be replicated at the array level or Tivoli Storage Manager must create a copy to another media such as tape, but regardless, best practice dictates that there should never be a single copy of a backup. Furthermore, the additional copy must be stored offsite.
10. Backup agents for applications
Although it can be tempting to save on licensing costs by not leveraging TSM backup agents for applications, it is rarely an advantage over time. Not using agents often requires an application shutdown to get a clean backup copy. This also often means a full backup instead of taking advantage of the incremental capabilities provided by application specific backup agents.
11. Disaster Recovery Manager
The Disaster Recovery Manager (DRM) module of TSM should be fully configured with all "instructions" files documented and maintained. Every time the Tivoli Storage Manager "prepare" process runs, the Disaster Recovery Manager produces a TSM recovery plan file that contains all the TSM configuration information, along with a list of offsite tapes and database backup volumes. This information is essential to recover the TSM server in the event of a total failure. Obviously, the DRM plan file must be sent offsite daily with the copy storage pool tapes and database backup.
More information on TSM best practices can be found in the Deployment Guide Series: IBM Tivoli Storage Manager