RAID rebuilds: How do RAID rebuilds work and which is fastest?

RAID rebuilds: How do RAID rebuilds work, how can RAID rebuilds be made quicker and what is the fastest RAID-level rebuild time?

RAID rebuilds re-create data on RAID arrays when disks fail. But how do RAID rebuilds work, what is pre-failure and post-failure replacement, which RAID levels rebuild the quickest and what's the best way of reducing RAID rebuild times?

In this interview, Bureau Chief Antony Adshead speaks with Steve Pinder, principal consultant at GlassHouse Technologies (UK), about the mechanism of RAID rebuilds and how to ensure the quickest possible RAID rebuild times on your arrays.

Download for later:

Listen to the podcast with Steve Pinder
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As When do RAID systems carry out rebuilds and what exactly happens when they do? 

Steve Pinder: RAID systems protect data against a hard disk failure, allowing the array to copy data from the failed disk to a spare drive while it is being replaced. A physical drive failure such as a cracked platter or a broken circuit board should not be confused with a logical drive failure. Logical failures are usually caused by some sort of corruption, but cannot be cured by the replacement of a disk drive.

There are two main instances where a hard drive will be replaced in a RAID array:

  • Pre-failure replacement
  • Post-failure replacement

Pre-failure replacement is when an array senses that the hard drive will fail shortly and marks it for replacement. If a hot spare is available, a block-for-block copy will be carried out from the old drive to the hot spare, which will become the active drive. For many arrays, an alert will be sent for the old drive to be replaced.

Post-failure replacement takes place when the hard drive breaks unexpectedly before a pre-failure replacement can take place. The data on the failed drive must be rebuilt from the parity data on the remaining active drives and written to a hot spare. Post-failure replacement takes considerably longer due to the calculations that must take place to rebuild the data. How can I cut RAID rebuild times?

Pinder: As we know, pre-failure drive replacements are much quicker than post-failure replacements. Technically, they don't involve RAID rebuilds as the data is copied directly from the old drive to the new drive. To mitigate the risk of drive failures, you should always try to ensure that the RAID array you use is capable of pre-failure replacements.

Once a post-failure replacement is necessary, the RAID rebuild time for a particular drive technology and speed is dependent on three factors:

  • The size of the drives in the RAID set.
  • The number of drives in the RAID set.
  • The priority given on the array to rebuild activities.

The size of the drives is a fairly obvious factor, as it will take a lot longer to replace the data on a 600 GB drive than it will on a 72 GB drive.

The number of drives also affects the rebuild time as the array has to read from each remaining drive to determine the data to put on the replacement drive. A general rule is that the more drives there are in the RAID set, the longer the rebuild time will be.

The priority of the rebuild process can be set against host I/O on most RAID arrays. The higher the priority given to the rebuild process, the faster it will be, although this will result in degraded host-access performance. What are the relative rebuild times for the different RAID types?

Pinder: There are a number of popular RAID levels and the rebuild times differ for each. Here are a few pointers when considering which RAID level to implement:

  • RAID 0: Contrary to what you may expect, there is no redundancy in a RAID 0 environment. If a drive fails, your data is lost.
  • RAID 1: This is a mirrored pair and has a fast rebuild as data is copied block for block from the source to the target.
  • RAID 10: RAID 10's mirrored stripe sets rebuild at a pace similar to that of RAID 1.
  • RAID 5: Single-parity RAID 5 takes longer than mirroring as data has to be read from each drive in the set, with rebuild times taking longer the more drives there are.
  • RAID 6: RAID 6's double parity takes longer than RAID 5 to rebuild, although rebuild times are less of a consideration as two drives can fail without data loss.

Read more on SAN, NAS, solid state, RAID