Data storage components overview

Data storage components are at the core of any enterprise storage system. This guide to data storage components can help bring things into focus.

Data storage components are at the core of any enterprise storage system. At the very lowest level, hard discs are the medium that hold vital corporate data.

From mundane memorandums to mission-critical sales records, the choice of hard discs can have a profound impact on the capacity, performance and long-term reliability of any storage infrastructure. But it's unwise to trust valuable data to any single point of failure, so hard discs are combined into groups that can boost performance and offer redundancy in the event of disc faults. At an even higher level, those arrays must be integrated into the storage infrastructure -- combining storage with network technologies to make data available to users over a LAN or WAN. If you're new to storage, or just looking to refresh some basic concepts, this chapter on data storage components can help to bring things into focus.

The lowest level: Hard discs

Hard discs are random-access storage mechanisms that relegate data to spinning platters (a.k.a. discs) coated with extremely sensitive magnetic media. Magnetic read/write heads step across the radius of each platter in set increments, forming concentric circles of data dubbed "tracks." Hard disc capacity is loosely defined by the quality of the magnetic media (bits per inch) and the number of tracks. Thus, a late-model drive with superior media and finer head control can achieve far more storage capacity than models just six-12 months old. Some of today's hard drives can deliver up to 750 Gbytes of capacity. Capacity is also influenced by specific drive technologies including perpendicular recording, which fits more magnetic points into the same physical disc area (a.k.a. areal density).

The performance of a hard disc is heavily influenced by the rotational speed (rpm) of the platters and the interface that connects the drive to its host computer. Speeds from 5,400 to 7,200 rpm are most common in personal computers and secondary storage systems, while 10,000 and 15,000 rpm discs are allotted to servers and primary storage systems. The interface itself manages data transfer to and from the drive. Both ATA and SCSI interfaces are traditional parallel architectures that transfer commands and data across multiple data lines simultaneously. ATA offered lower data rates and was mainly employed in personal computers, while SCSI provided faster data rates and appeared in workstations and servers. SATA and SAS are more current interfaces that pass ATA/SCSI commands serially along a single data wire. The move to serial cabling allows for faster data transfers and simpler (less expensive) connections -- the interface has no direct impact on the capacity of a hard disc.

Storage All-In-One Guides
Learn more about storage topics like disc storage, disaster recovery, NAS, and more.

Fibre channel (FC) is another popular serial hard disc interface frequently found in enterprise storage environments. FC is known for its tremendous speed; 2 Gbps and (more recently) 4 Gbps and data integrity features. FC is also a switched interface, so it is possible to create a "fabric" of storage devices and hosts where every host can see every storage device -- vastly improving the availability of data. This is a fundamental technology behind the SAN.

Grouping the discs: RAID

Hard discs are electromechanical devices and their working life is finite. Media faults, mechanical wear and electronic failures can all cause problems that render drive contents inaccessible. This is unacceptable for any organization, so tactics are often implemented to protect against failure. One of the most common data protection tactics is arranging groups of discs into arrays. This is known as a RAID.

RAID implementations typically offer two benefits; data redundancy and enhanced performance. Redundancy is achieved by copying data to two or more discs -- when a fault occurs on one hard disc, duplicate data on another can be used instead. In many cases, file contents are also spanned (or striped) across multiple hard discs. This improves performance because the various parts of a file can be accessed on multiple discs simultaneously -- rather than waiting for a complete file to be accessed from a single disc. RAID can be implemented in a variety of schemes, each with its own designation:

  • RAID-0 -- disc striping is used to improve storage performance, but there is no redundancy.
  • RAID-1 -- disc mirroring offers disc-to-disc redundancy, but capacity is reduced and performance is only marginally enhanced.
  • RAID-5 -- parity information is spread throughout the disc group, improving read performance and allowing data for a failed drive to be reconstructed once the failed drive is replaced.
  • RAID-6 -- multiple parity schemes are spread throughout the disc group, allowing data for up to two simultaneously failed drives to be reconstructed once the failed drive(s) are replaced.

    There are additional levels, but these four are the most common and widely used. It is also possible to mix RAID levels in order to obtain greater benefits. Combinations are typically denoted with two digits. For example, RAID-50 is a combination of RAID-5 and RAID-0, sometimes noted as RAID-5+0. As another example, RAID-10 is actually RAID-1 and RAID-0 implemented together, RAID-1+0. For more information on RAID controllers, see the article The new breed of RAID controllers.

    A closer look at storage arrays

    Storage Learning On-The-Go
    Download this overview and listen on your iPod or laptop.
    Of course, there are many ways to group hard discs and enterprise storage can easily involve dozens to hundreds of discs arranged into storage arrays. The very largest arrays can store hundreds of terabytes (TB) (even petabytes) of data. The most basic expression of disc grouping is JBOD. This is simply the accumulation of pure capacity, and doesn't offer any redundancy or performance benefits. For example, putting five 200 Gbyte drives in a JBOD arrangement simply yields 1 TB of unprotected storage.

    As you saw above, RAID arrays group relatively small sets of discs to work cooperatively for redundancy or added performance -- often both. However, redundancy costs drive space. Suppose there are 10 200 Gbyte drives. That's 2 Gbytes of raw storage but mirroring will cut that total in half to 1 Gbyte of mirrored storage. Advanced RAID configurations like RAID-6 can ease the need for redundant disc space by using parity techniques on a dedicated drive. The parity data is then used to rebuild the data on a failed drive.

    Storage arrays can also be classified as modular or monolithic. A modular storage array, like EMC Corp.'s Clariion AX100 is typically small and self-contained with less than 24 drives, designed for lighter traffic patterns found in small and-mid-sized organizations. New modular arrays can be acquired to keep pace with growing storage needs. In contrast, a monolithic storage array, such as EMC's Symmetrix, Hitachi Data Systems Inc.'s Lightning and IBM's DS8000, can be dramatically larger, hosting hundreds of drives with the communication capability to handle heavy utilization. The expense and management overhead needed for monolithic arrays usually result in just a few key deployments. The actual line between modular and monolithic arrays is blurring somewhat today. There is no clear line of demarcation and the features found in high-end arrays are frequently appearing in smaller, lower end systems.

    Clustering is a relatively new concept in storage. Storage clusters are basically groups of storage arrays sharing redundant connections to work cooperatively as a single storage system. The use of multiple arrays can service storage requests very quickly, resulting in superior performance while supporting large numbers of users. There is also inherent redundancy -- when one element of the cluster fails, the other elements take over without interruption to ensure that data is continuously available. Storage clusters are generally deployed where top performance and storage system uptime are most crucial. You can learn more about clustering here.

    Getting storage on the network

    Of course, storage is useless unless network users can access it. There are two principle means of attaching storage systems: NAS and SAN. NAS boxes are storage devices behind an Ethernet interface, effectively connecting discs to the network through a single IP address. NAS deployments are typically straightforward and management is light, so new NAS devices can easily be added as more storage is needed. The downside to NAS is performance -- storage traffic must compete for NAS access across the Ethernet cable. But NAS access is often superior to disc access at a local server.

    The SAN overcomes common server and NAS performance limitations by creating a subnetwork of storage devices interconnected through a switched fabric like FC or iSCSI (called Internet SCSI or SCSI-over-IP. Both FC and iSCSI approaches make any storage device visible from any host, and offer much more availability for corporate data. FC is costlier, but offers optimum performance, while iSCSI is cheaper, but somewhat slower. Consequently, FC is found in the enterprise and iSCSI commonly appears in small and mid-sized businesses. However, SAN deployments are more costly to implement (in terms of switches, cabling and host bus adapters) and demand far more management effort.

Read more on Integration software and middleware