By combining physical drives and presenting them as a single hard drive to the operating system, RAID technology allows storage pros to store the same data in different places on multiple disks. I/O operations can therefore overlap, which means performance can improve and data storage protection can increase. For organisations considering a RAID deployment, there are multiple factors that need to be looked at, particularly the available levels of RAID technology and the specific needs of their data storage infrastructure.
In this SearchStorage.co.UK podcast, Arun Taneja, senior analyst and founder at Taneja Group, examines the functions and differences of the various RAID levels. You can listen to Arun's thoughts on RAID 0 through RAID 6, or you can read the transcript below on understanding RAID levels.
TABLE OF CONTENTS:
RAID level 0: Striping
RAID level 1: Mirroring and performance improvements
RAID level 3: Byte-level parity
RAID level 4: Block-level parity
RAID level 5: Rotating parity
RAID level 6: Tolerates failure of two disk drives
Technically speaking RAID 0 is not a RAID level, but it is customarily viewed as RAID. Let's say that you have three disk drives. Instead of writing everything on the first disk drive, you split the data up. So the first chunk of that file would be placed on disk one, the next chunk would be placed on disk two and the final chunk would be placed on disk three. You would then repeatedly rotate around that until the file is actually included on the correct array.
You should think of RAID 0 as striping. Essentially, you have broken the file data down into pieces and placed them in the first, second and third arrays until they land in the correct array. Instead of reading the data from one disk drive, you now read the data parallel from all disk drives and then you combine the data on the other end. Essentially, you end up getting the performance of three disk drives and therefore the access of that file is vastly improved. This is classic striping.
So the performance factor in RAID 0 improves because you have three disk drives pumping data back at you. There is also no impact on the availability of the data. If one of those three disk drives dies, your whole file is ruined because you do not have any coherent, consistent data, you have a chunk missing.
RAID 1, also known as mirroring, is essentially where you have two disk drives and whatever you put on disk one, you simultaneously put on disk two. The idea is that if one of those two disks dies, then you have the other disk that is still working and therefore you achieve data availability improvements.
You can also achieve performance improvements with RAID 1. When everything is functioning correctly -- both of the disk drives are spinning, behaving properly and you're reading data from both disk drives -- you're read performance will improve double for all practical purposes.
There is a performance improvement in the situation of read, but not in terms if write. You also achieve extra availability of your data, because if one disk drive dies, you can access your information from the other disk drive.
RAID 3 is the process of gaining up a certain number of disk drives, with the minimum number of drives being three. For this particular example, we'll use five disk drives. Four of those disk drives would be data disks drives and the fifth would actually be a parity disk drive.
The idea is that you have striped the data on the first four disk drives and then you calculate a parity from those drives to be placed on the fifth disk drive. In this type of situation, if any of those single disk drives fail, you can actually re-create that information by using the other four disk drives that are still working.
So RAID 3 is very common in large, sequential workloads, such as video files. You want to be able to read a video file very quickly and you want to keep going from one end to the other. Very often, video files prefer to use RAID 3; you lock the first four drives in the example and just start extracting information from those drives in a rapid fashion. Performance and availability both improve greatly because any one of those five disk drives can die, but your data will still be safe.
RAID 4 is very similar to RAID 3 in that a parity disk drive is always one of the five associated disk drives. But the difference is instead of doing the parity at the byte level, in RAID 4, the parity is done at the block level.
The difference between RAID 3 and RAID 4 is very minor, and is only really applicable when you start looking at the finer art of RAID systems. Beyond performing parity at the byte level, the amount of data that is considered a chunk also differentiates RAID 4 from RAID 3.
RAID 5 is very similar to RAID 3 and is probably the most popular level of the technology. In RAID five, the last disk drive is not the only drive that contains parity data in the array.
In RAID 5, you rotate the parity in the five disk drives. So the first array may contain four drives of data and then one drive of parity, then the next array would contain three disks of data and then two disks of parity. In essence, parity is moved around in a round robin-type fashion. All five disk drives have a combination of data and parity.
Again, if a single disk drive in that pairing dies, you can re-create that information from the remaining four disk drives. So you can tolerate a single disk drive failure, which improves the availability of data. This also improves performance because you are striping the data on the other four disk drives.
RAID 6 is similar to RAID 5 in terms of striping and parity, with the major difference being that RAID 6 can tolerate two disk drives failing. In RAID 6, up to two disk drives can die and you can still have an efficient level of data availability.
This was first published in December 2009