Tiered storage steps up its game

Tiered storage that matches data types to Fibre Channel, SATA and SAS disk according to performance needs is rapidly gaining ground as a feature of storage subsystems

Years ago, tiered storage was a relatively simple affair. You had a disk tier and a tape tier, and software automatically moved files from one tier to the other once they'd gone untouched for a set length of time, and moved them back again if they were subsequently needed.

If you were very lucky and had large volumes of data to archive -- such as documents that might need to be recovered faster than they could be read from tape -- you might have an intermediate tier as well, often an optical disk library.

The cost of tiered storage is dramatically different, even within the same box. SATA is half to two-thirds the cost of Fibre Channel, and even within SATA tiers there is a difference.
Roger Bearpark
assistant head of ICTLondon Borough of Hillingdon
These tiered-storage rules, known as hierarchical storage management (HSM), were fairly simplistic, and the tiers themselves differed greatly in performance and accessibility.

That began to change when different types of hard disk were introduced to businesses, and it changed even faster as storage vendors realised they could put more than one type of disk drive into a single subsystem. Almost all storage suppliers now offer a single box that holds fast Fibre Channel (FC) or SAS drives, as well as cheap high-capacity SATA drives, all managed under the same console.

Add to the mix different levels of data redundancy -- parity RAID for some drive groups, faster mirrored RAID for others, for example -- and your storage tiers look a lot more granular than they used to.

The other big advantage of many tiers in the same subsystem is that data can be accessed in place, regardless of which tier it's on. This allows applications to run off more than just the primary disk tier. That means you can match the service level offered by the storage tier to the service level needed by an application. For instance, one application might need high performance and replication, while another might only need high performance.

Both block-based (SAN) and file-based (NAS) storage can benefit from tiers. So can all three main classes of data -- unstructured, which is mostly files; structured, such as databases; and semi-structured, the most obvious example being email. Each needs different solutions; the concept is the same, but the tiers and tiering logic aren't.

"Many people limit the concept of tiering to block storage, yet a lot of information resides as files and they can be tiered, too," notes Philippe Nicolas, who chairs the Storage Networking Industry Association (Europe) Data Management Initiative (SNIA DMI).

However, before you can decide which tiers to put your data on, you need to understand the cost of delivering storage to an application. That's not just the cost of the arrays, it's also the cost of housing, powering and cooling them, plus the cost of managing it all.

Then you need to understand your storage service levels (in terms such as performance and recoverability, plus data classification) to match each application's requirements to the storage tiers available.

RTOs, RPOs and ROI

"For example, in the case of an outage, how fast do you need to recover your data?" says Jim Spooner, strategy services practice manager at specialist storage consultancy GlassHouse Technologies UK. "And how much data could you afford to lose? So does the tier need mirroring? And how frequently does it need snapshots?" In storage jargon, the first of those is the recovery time objective (RTO), while the second is the recovery point objective (RPO).

"Then you need a process for selecting the right tier as new applications come along and chargeback mechanisms to assign values to the tiers, otherwise there's no penalty to picking the best service," adds Spooner.

How many application storage tiers could, or should, you offer? Fife Council -- the third largest authority in Scotland -- originally planned for four service classes (from bronze to platinum), but has revised that to three, says Roddy Cameron, the council's technical consultant.

"Platinum would be Fibre Channel disk with synchronous replication. Gold would be Fibre Channel but with asynchronous replication, and Silver would be an archive tier on SATA disk," he explains.

"Bronze would be for test and development, also on SATA but with a lower service level," he notes. "But we don't yet have the network infrastructure in place to do synchronous replication, so we will probably just have three tiers for now." He adds that there's also primary backup to disk, and a subsequent copy to tape.

But do those different tiers of hard disk have different costs, even if they're on different shelves within the same box?

Certainly they do, says Roger Bearpark, assistant head of ICT for the London Borough of Hillingdon. He adds that Hillingdon's Compellent storage subsystems contain both Fibre Channel and SATA drives, and will probably contain solid-state disks within the next year or so.

"The cost of tiered storage is dramatically different, even within the same box," says Bearpark. "The cost per terabyte for SATA is half to two-thirds the cost of Fibre Channel, and even within SATA tiers there's a difference."

Tiered storage can also improve application performance by reducing the contention for primary storage, whether the other tiers are in separate subsystems or simply on different shelves within the same subsystem. It can also save money because you no longer need to overprovision primary storage.

Bearpark explains that the Compellent system can create different service levels within a single array by using the outer tracks of the disks for one logical volume and the inner tracks for another. The outer tracks are longer, so they move past the read/write head faster, giving better performance. (Several other storage subsystem builders claim similar capabilities, including 3PAR, Overland Storage, Pillar Data Systems and Sun Microsystems Inc.'s StorageTek.)

The cost/performance difference between the SATA tiers is relatively small, at perhaps 5%, but Bearpark says that as long as the tiering can be automated, it's worth doing.

"If we had to do a lot of manual work to support tiering, it wouldn't be worthwhile," he adds. "We do it because we have tools that make it easy to do, and because it proves to the council that we're spending money wisely. It also helps put some discipline on how council departments use storage."

Cameron agrees. "It's about enhancing our data management, how we store data according to its value," he says. "There's also the intention to link it to chargeback to encourage more cost-efficient user behaviour."

So what we have are two types of tiered storage: pre-assigned hard disk tiers for applications, and automated tiering where data moves as it ages. Ideally, you would combine the two, with active data on the appropriate disk tier being archived off as it ages.

If this sounds somewhat similar to the last sales pitch you received for information lifecycle management (ILM), there's a good reason for that. The two are closely related.

That's because tiered storage can also be used as an enabler to help meet the demands of ILM and regulatory compliance. Tiering emphasises the operational layer; ILM also covers legal and compliance issues, so is in many ways a superset of tiering.

"Tiered storage is about aligning the value of the data with the value of the storage. ILM attempts to maintain that link automatically as the data ages and its value changes," explains Nicolas. For instance, a content-addressed file store (CAFS) device could be one of the tiers, with data moved onto it when business rules say it must be retained.

All that links back to the need to understand your applications and business processes, as well as the cost of running storage.

"Tiered storage needs infrastructure, but it's more about business processes -- getting people to change how they manage data and to stop treating storage like a bottomless pit," says Cameron.

"It doesn't do regulatory compliance on its own," he says, "but a tiered approach means you're classifying your data and how long it's kept before it's archived, so it's all tied together."

Read more on SAN, NAS, solid state, RAID