The storage area network (SAN) centralizes enterprise storage by interconnecting storage devices and subsystems through a dedicated high-speed network fabric, such as ESCON or Fibre Channel (FC). A SAN can also extend beyond the local data center, connecting storage systems at remote geographic locations through WAN links like ATM or SONET. Once implemented and configured, the SAN's storage resources can be managed centrally, allowing administrators to organize, provision and allocate that storage to users or applications operating on the network across an organization. Centralization also allows administrators to monitor performance, troubleshoot problems and manage the demands of storage growth. If you're new to SAN technology, or just need to refresh the basics, this guide covers the essential concepts of configuration, provisioning, performance and capacity management, and monitoring and troubleshooting.
RAID technology serves two purposes in the disk array or server; it can be used to improve storage I/O performance through striping, and it can bring redundancy to the RAID group through mirroring and parity techniques. When implementing RAID, it's necessary to select an appropriate RAID level and specify a RAID group size (the number of disks committed to the group). For example, use RAID-1 when top performance is essential. This mirrors the contents of one disk to another but uses twice the number of disks. Other RAID levels protect disk groups by striping parity information across each disk in the group. RAID-5 uses one additional disk for parity data, while RAID-6 uses two extra disks -- lowering the disk overhead needed to protect important data.
Rebuild time is a serious issue when configuring RAID arrays. When a disk fails, it takes time to rebuild the failed disk's contents. During a rebuild, the RAID group is inaccessible or operates at reduced performance. But as disk capacities have burgeoned, rebuild times have become problematic. Now that SATA disks are routinely exceeding 500 GB, failures can take hours to rebuild, exposing the RAID array to a greater potential for multiple disk failures and data loss. Look for disk arrays that offer fast rebuild times and predictive fault features that can start a rebuild to a spare disk before a complete disk failure occurs.
Another issue comes in changes to the RAID setup. Traditionally, a RAID group was a static entity once a level and group were selected. To change a RAID level or group size, the group would have to be rebuilt from scratch using the new size and level, and then reloaded from a backup. An increasing number of RAID platforms support dynamic RAID groups, allowing administrators to change levels and group sizes on the fly.
The big problem with a SAN is to centralize storage while restricting access to authorized users or applications; the entire storage environment should not be accessible to every user. Administrators must carve up the storage space into segments that are only accessible to specific users. This management process is known as provisioning. For example, some amount of data center storage may be provisioned for an Oracle database that might only be accessible to a purchasing department, while other space may be apportioned for personnel records accessible to the human resources department.
The major challenge with provisioning relates to storage utilization. Once space is allocated, it cannot easily be changed. Thus, administrators typically provision ample space for an application's future use. Unfortunately, storage capacity that is provisioned for one application cannot be used by another, so space that is allocated, but unused, is basically wasted until called for by the application. This need to allocate for future expansion often leads to significant storage waste on the SAN. One way to alleviate this problem is through thin provisioning, which essentially allows an administrator to "tell" an application that some amount of storage is available but actually commit far less drive space -- expanding that storage in later increments as the application's needs increase.
Provisioning is accomplished through the use of software tools. Tools typically accompany major storage products. For example, EMC's Celerra includes Automated Volume Management software. The issue for administrators is to seek a provisioning tool that offers heterogeneous support that covers the storage platforms currently in their environment. Otherwise, IT staff will need to use multiple provisioning tools, increasing management difficulty.
SAN performance and capacity management
SAN performance can be adversely affected when storage runs low, resulting in application performance problems and service level issues. Many IT organizations guard against this threat by overbuying and overprovisioning storage, but this frequently results in wasted capital since the additional storage investment is not necessarily utilized. Organizations are embracing performance and capacity planning practices to avoid unexpected storage costs and disruptive upgrades. The goal is to predict storage needs over time and then budget capital and labor to make regular improvements to the storage infrastructure.
In actual practice, SAN performance and capacity planning can be extremely difficult. It's virtually impossible to predict the storage needs of an application or department over time without a careful assessment of past growth and a comprehensive evaluation of future plans. In fact, many organizations forego the expense and effort of a formal process unless a mission-critical project or serious performance problem demands it. Organizations that do choose to sustain an ongoing performance and capacity planning effort will need comprehensive storage resource management (SRM)-type tools, like ControlCenter software from EMC Corp., Hi Command Storage Services Manager from Hitachi Data Systems Inc. (HDS) or Storage Horizon software from MonoSphere Inc.
SAN problems can be particularly difficult to isolate -- further complicated by the complex configurations and interrelationships between the servers, switches and storage platforms that often populate a SAN. A working SAN is a digital ecosystem unto itself and seemingly innocuous changes in one place can have a catastrophic impact on another.
The best SAN troubleshooting is typically proactive and usually involves establishing a performance baseline of critical characteristics before problems ever arise. It's then a simple matter to compare a current baseline against a "known good" baseline. This often reveals problems quickly and can identify any performance changes as the result of upgrades or reconfigurations. SAN monitoring tools include SANScreen 3.0 from Onaro Inc. and Enterprise Control Center (ECC) from EMC Corp.
Another key to effective SAN troubleshooting is comprehensive change management policies. By tracking changes and restricting change activities to authorized IT personnel, an administrator can avoid unexpected trouble and quickly correlate help requests with recent SAN changes. ***