RAID rebuild pain points and the alternatives

RAID rebuilds can be a particularly time-consuming and risky part of managing a storage array, but new methods such as wide parity and erasure codes promise relief.

RAID rebuilds can take as long as two weeks on a 2 TB drive using RAID 5 or RAID 6, and with every hour that passes you face the increasing likelihood of a second drive failure. But, new ways of protecting data and carrying out disk drive rebuilds are coming to market that promise to reduce your exposure to risk.

In this interview Bureau Chief Antony Adshead speaks with Marc Staimer, chief dragon slayer with Dragon Slayer Consulting, about the pain of RAID rebuilds and new products incorporating wide parity and erasure codes.

You can read the transcript below or download the MP3.

Play now:

Download for later:

Download the podcast with Marc Staimer
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As Where, in a practical, everyday sense, might users of storage feel the shortcomings of RAID?

Staimer: Where they're going to feel the shortcomings of RAID has to do with large-capacity drives -- 1 TB, 2 TB, 3 TB drives -- in which they have a failure in a RAID group. The most common RAID groups are RAID 5 or RAID 6, and in either of those cases when a drive fails the system has to rebuild the drive. So, you have to take the drive out and put a new drive in or, in some of the more sophisticated storage systems, there's a pool of drives that it pulls from, electronically -- not necessarily physically -- so that it looks like you took the bad drive out and put a new drive in.

In the process of doing that there are a couple of things that will go on. First, if it's a manual process, what if you pull the wrong drive? Then you're in deep trouble, especially if it's a RAID 5. If it's a RAID 5 and you pull the wrong drive, you just lost all the data in your RAID group. If it's a RAID 6, you just made your rebuild incredibly slow.

Let's say you replace it with the right drive. You now have the situation where the system has to rebuild the drive. To rebuild the drive takes time because it's based on a concept called parity. So, it's going to re-create the data over a period of time. If you have a 2 TB drive, which is the most common, high-density drive on the market today, that will mean it will take, as a priority process, approximately 60 hours to rebuild the 2 TB drive. It'll be somewhere in that neighborhood; sometimes a little less, sometimes a little more, depending on the system, the amount of memory, the amount of processing, but roughly about 60 hours.

In that 60-hour time frame, you have a very high probability of a second drive failure, and that's why most people go to RAID 6 because if that second drive fails during those 60 hours of rebuild, then you're going to lose all the data in the RAID group if it's a RAID 5. If it's a RAID 6, it will rebuild the second drive to go through the process at the same time for both drives. Here's the rub for both of those. When a drive is being rebuilt as a priority process, you're going to lose a significant amount of performance out of that storage system -- typically about half, sometimes more, sometimes less, but right around 50%. That means all your applications are going to have a reduced response rate, a reduced throughput rate.

Most enterprises, most large companies, even most SMBs can't really tolerate that, so they'll set the RAID rebuild process, not as a priority process but as a background process. So, that 60-hour time for a rebuild of a RAID 5 or RAID 6 2 TB drive will go to, roughly, 16 days. That's a really long time where you could have a second failure so you have to run it in a RAID 6 environment.

As you run to the RAID 6 environment, you're now losing more of your usable capacity, to the parity. What happens when you have a third disk failure? Now we're back to the same issue so now you're hearing people promoting RAID 6 triple parity -- when will it end? At some point, you start looking at this and saying, "This is incredibly complicated for something that was supposed to be incredibly simple to protect the user against a disk drive failure."

And of course, there are other types of disk drive failures which fall into this category, such as unrecoverable read errors. So these are some of the issues with RAID. What kind of alternatives to RAID are coming to market and, briefly, can you explain how they work?

Staimer: Sure, but before I answer that, let me just say that there's no such thing as a free lunch; there are always tradeoffs, so every technology that solves a problem has its own. Having said that, yes, there are a number of alternatives coming to market that are in the market already; one is a wide stripe or wide parity. Wide parity is similar to mirroring. Mirroring is RAID 1 or RAID 10, in which you make a complete copy of the data on another drive. The problem with that is, if both drives happen to fail -- that is, the primary copy and the mirrored copy -- then you're SOL, you're hosed, as a friend of mine says. But in real terms, this wide striping goes a bit further.

Instead of copying the data from one drive to another drive, it copies the data across a wide variety of drives called a wide stripe. It does that with parity, instead of the data. It means rebuilds are really fast. In fact, one of the players who do this today is IBM with XIV. They can rebuild a 2 TB drive in 30 minutes. It's pretty amazing. Of course, there are some other things that come with that, which means you have to use far less than your maximum capacity. Your usable capacity is roughly about half, similar to mirroring, so there are some issues there.

Another one is erasure codes, also known as forward error correction. Erasure codes are used by a number of vendors including IBM in its XIV but it's also used by Cleversafe and in EMC's Atmos. It's used by NEC HYDRAstor; it's being used by a variety of players and more coming out every day. Cleversafe is one of the ones I consider furthest along in this but in general erasure codes are a form of object-oriented storage in which you describe the entire object in every slice of that object.

So, for example, let's say you're writing a file and you break that file down into 16 slices, or chunks, and each chunk of data has metadata, or information, about all the other chunks of the entire data. So, you could actually only have to access three, four, five, six -- depending on the amount of metadata -- chunks of data to see all the data.

It's a variation on a hologram, but with data. If you look at a hologram, a picture, a photograph and cut off half the photograph and look at that hologram at a different angle, you can see the whole photograph; it's the same thing but with data. So, you can lose big chunks of your data and still access all your data without having to re-create your data. It's very clever technology. It started out in the late '90's [as a theory], and it's hitting the market now. Now, in general, that sounds great except it's going to add latency, anytime you have to add a lot of information about the data to be able to read your data. You've got to read your information before you read your data; therefore it adds latency and affects response time. So, there's no free lunch.

Read more on SAN, NAS, solid state, RAID