As a fundamental of data protection, Raid (redundant array of independent disks), has been around since the mid-1980s. The idea is quite simple; use multiple disk drives to enable data protection (via mirroring or parity) and spread data across all the drives to allow any failing unit to be rebuilt in reference to the others.
Raid is implemented in many variants. These include Raid 1/10 for mirroring, which provides good read and write performance; Raid 5 for capacity, which has good read performance but delivers less well on write I/O; and Raid 6, which provides for a higher degree of availability than Raid 5, due to the extra parity data it stores.
As well as Raid level, storage administrators must consider other factors that have a bearing on performance.
Stripe set size is the number of disks across which data is written. As the stripe set increases, data is written across more drives and can result in greater I/O capability.
However, large stripe sets with high-capacity drives can result in failure during a data rebuild, due to unrecoverable read errors. This is where the drive rebuild fails, due to being unable to successfully read a block of data needed to complete the process.
Raid rebuild times also increase significantly as drive capacities grow and rebuilds can now take days to complete, depending on the continuing background workload, while drives of capacities predicted for the future will incur rebuild times running into months. Even a rebuild that takes a few days results in an unacceptable length of time for production data to remain unprotected.
Raid has continued to evolve and we have seen new protection methods that use the essential components of Raid, but distribute data and parity information in new ways.
For example, the idea of building resilient storage from block-level Raid has been implemented in many systems, including HP’s 3Par platform and IBM’s XIV. The XIV array divides physical disks into 1MB partitions and mirrors them across all devices in the array. For any single disk failure, all disks in the system are involved in the rebuild, making recovery time significantly faster than with traditional Raid.
The idea of Raid has also been challenged in other ways; hyperscale computing has moved the unit of redundancy up to the server level. Here, the costs of implementing Raid (controller cards and/or software and additional disk capacity) have been replaced by redundant groups of servers.
Some suppliers, such as X-IO, have built black-box sealed-unit disk arrays into which Raid resilience and additional capacity has been added but which cannot be upgraded or repaired during the lifetime of the device.
Using Raid and SSD
So how relevant is Raid to the new world of flash drives? Unlike spinning disk HDDs, flash drives have no moving parts and are not subject to mechanical failure such as disk head crashes. To improve the life of the device, solid-state devices implement wear leveling and other algorithms to distribute write I/O, which over time would cause these devices to fail prematurely.
But, despite their differences, flash drives do fail. There is always the risk of component failure (for example, issues with the device controller) and eventually an SSD will fail because they have limited write I/O capacity. This means some protection is required to cater for failure scenarios.
The question is, how this protection should be achieved. Suppliers offering new all-flash arrays have typically implemented system-wide redundancy to gain the benefits of using all devices for I/O and to evenly distribute write I/O to gain maximum lifetime from all solid state components.
Violin Memory, for example, implements a proprietary Raid technology called vRaid. This distributes I/O load across all components and ensures the normal erase cycle encountered when writing to SSD does not affect the performance of other I/O host traffic.
The impact on performance of the erase cycle for read I/O from SSD devices may result in performance problems that can be mitigated using Raid. Pure Storage’s FlashArray uses a proprietary Raid known as Raid 3D. This treats read I/O delays on a single flash drive as a device failure and reads the data by rebuilding the read request from other devices in the same parity group. This is only possible because of the high performance and consistent response times of solid-state devices.
SSD Raid in products
Raid has limitations and these are being experienced as individual disk capacities scale into many terabytes. Building arrays from the ground up – especially using SSDs and flash components – offers the opportunity to be creative with new models of data protection that extend the Raid paradigm. But, of the established storage suppliers, only Hitachi Data Systems (HDS) has developed a bespoke flash module, with all suppliers treating SSDs as traditional hard drives in its Raid implementations. However, as we move forward with new array designs, the traditional view of Raid will become a thing of the past.
EMC offers hybrid flash/HDD VNX and VMAX arrays that implement all standard Raid levels. There are also all flash versions of its midrange VNX platform that offer Raid 0, Raid 1, Raid 10, Raid 3, Raid 5 and Raid 6 implementations with a maximum of 250 flash drives. Recently EMC has released XtremIO, its all-flash platform. However, this is not yet on general release and technical details have not been made available.
NetApp Data ONTAP – NetApp provides good support for Raid 4 and Raid DP (its implementation of Raid 6) on SSD devices. Raid 4 disk groups can scale to a maximum of 14 (13D+1P) devices, while Raid DP scales to 28 devices (26D+2P). NetApp has the flexibility to allow any number of data disks up to the maximum configurations. The new EF540 platform from NetApp supports up to 24 2.5” 800GB flash drives and can be configured using Raid 0, Raid 1, Raid 3, Raid 5, Raid 6 or Raid10
HDS supports up to 256 flash drives in its high-end VSP product line, with drive capacities of 200GB and 400GB supporting Raid 6, Raid 5 and Raid 1 (2D+2D) configurations. For larger deployments, the VSP supports up to 192 Accelerated Flash Modules, each of 1.6TB in capacity. These implement wear leveling, compression and other management features onto the card and can be combined into Raid 1, Raid 5 and Raid 6 solutions in the same way as traditional SSDs. Hitachi’s HUS (excluding HUS VM) range of unified storage products offer between 120 and 960 SSD drives per array and can use Raid 0, Raid 1, Raid 10, Raid 5 and Raid 6.
HP provides flash options in its 3PAR StoreServ and StorVirtual platforms. StoreVirtual P4900 offers disk Raid 5, Raid 6 and Raid 10. The system also implements network Raid, connecting multiple controllers together for added resilience. This implements Raid 0, Raid 5, Raid 6, Raid 10, Raid10+1 and Raid 10+2 per logical volume. The 3PAR StoreServ platform offers hybrid and all-flash solutions with support for Raid 1, Raid 5 and Raid MP (multiparity).
IBM offers solid state drives in its V7000 platform and 200GB and 400GB MLC drives can be implemented using Raid 0, Raid 1, Raid5, Raid 6 and Raid 10. The XIV platform uses SSD for caching rather than implementing it as a separate disk tier and so doesn’t use a Raid architecture.
SolidFire, an all-flash startup has taken a “post-Raid” approach to implementing data protection. Its 3010 and 6010 series of multi-node arrays spread data across the nodes providing redundancy at the node level in a similar way to some hyperscale computing solutions.
This was first published in May 2013