It’s a fact of life that despite being highly reliable, flash media can go bad.
Drive internals (such as the controller or NAND) can fail, and of course drives have a specific lifetime. Once endurance levels are reached for a particular cell in NAND, then the media is at risk of failing or returning false results.
As a result, we continue to require data protection methods against media failure, using recovery techniques such as redundant array of inexpensive disks (RAID) and erasure coding.
RAID is a protection technique that uses data redundancy to protect against device failure.
There are multiple RAID levels that encompass simple data mirroring (creating one or more images of an entire drive or group of drives) to systems that calculate recovery information known as parity that can mathematically recreate lost data in the event of a device failure.
Common RAID formats include RAID-5 (two or more data disks and a parity disk in a RAID group) and RAID-6 (two or more data disks and two parity disks), the latter being used to provide higher resiliency in large capacity media devices where rebuild times can be significant. In practice, for RAID-5/6 systems, data and parity is spread across all media devices, rather than being on dedicated drives.
The main issue when using RAID for data protection is that of scalability. Large RAID groups reduce the overhead of the parity space, but result in increased rebuilds due to failures.
We have seen with hard disk drives (HDDs) that an increase in drive capacities and RAID groups results in significant recovery times for failed devices, during which time data can be unprotected against the subsequent failure of another device in the same RAID group. Hence the use of RAID-6 as a protection scheme.
RAID rebuilds also have an impact on performance. All devices in a RAID group are involved in recreating lost data, potentially resulting in degradation of host input/output input/output (I/O) performance, or elongated rebuild times.
Erasure coding is a technique that uses a mathematical function to transform a set of data into a form that includes redundancy in a way that allows the original data to be recreated from a subset of the redundant pieces.
Typically, the coding technique is expressed using two numbers, one defining the number of original pieces of data and another expressing the additional redundant pieces created. For example, an erasure coding scheme could take 10 original pieces of data, transform this into 16 pieces and allow the original data to be recovered using any 10 of the 16.
At first glance, this coding scheme may seem no more useful than RAID, but erasure coding has obvious benefits when used as a process of recovering data geographically dispersed across multiple datacentres.
In our example, imagine distributing four pieces of our 16 pieces of data across four separate datacentre locations. The erasure coding scheme could recover from the loss of any one single datacentre without having to create entire replicas of the original information. In addition, each time data is updated, reading any 10 of the pieces will allow the data to be read, without the need for traditional replication.
RAID vs erasure coding
Both RAID and erasure coding have benefits and disadvantages that make them suitable for different types of workloads.
RAID has typically been deployed as a way of recovering failed media within a single storage array or server, although network RAID implementations do exist.
As already mentioned, RAID has issues with scalability, with RAID-5 being particularly vulnerable to unrecoverable read errors. In this scenario, if a rebuild is taking place and one of the remaining data or parity components experiences a drive read error, then the missing data cannot be recovered. RAID-6 mitigates this issue at the expense of more parity and an impact on performance.
Erasure coding provides greater efficiency in implementing data protection across datacentres.
But, performing the erasure code transformation on data represents a performance impact to the application, on both read and write. This is because the maths is computationally complex for other than simple protection schemes and because data may have to be read from multiple systems across the wide area network.
Erasure coding therefore provides good resiliency at the cost of performance, explaining why we have typically only seen implementations in systems such as object storage.
Data protection and flash
The use of flash with media protection schemes needs some special considerations.
Flash is a great media for random read I/O but has a limited write lifetime. The exact number of writes a flash drive can sustain is based on multiple factors including the type of NAND flash in use and the efficiency of controller algorithms used to manage the media. Drives can have capacities from as little as 0.1 to 10 device writes per day (DWPD), and this resiliency is directly reflected in the price of flash products.
Obviously, any data protection scheme needs to minimise the impact of device writes on the media. A standard RAID-5 implementation will perform two writes for each host I/O write: one for the data and one for the updated parity. There are also two reads (data and parity), but these don’t impact on flash lifetime. RAID-6 implementations require three writes for each host I/O as there are two parity blocks to be updated.
Doubling or tripling the I/O count for RAID protection isn’t an appealing scenario for flash drives, and array suppliers need to implement systems that mitigate this problem.
Dell EMC’s XtremIO all-flash array, for example, buffers host I/O until there is sufficient data to write an entire RAID stripe across 23 data and two parity drives. The result is that the write overhead for XDP (XtremIO Data Protection) is 1.2x the number of host writes compared to 3x for standard RAID-6.
In a similar fashion, NetApp Data Ontap minimises “writes in place” by always writing new data. Most all-flash suppliers have introduced variations on RAID that are flash friendly. Kaminario has K-RAID, Hitachi protects data on its FMD modules using standard RAID implementations; IBM uses RAID on FlashSystem.
Another write reduction technique used by almost all the array suppliers is to implement data reduction technologies such as compression and deduplication.
HPE 3PAR recently introduced compression to 3PAR OS 3.3.1 to complement data deduplication, which writes only new data to physical media. Dedupe saves on physical space (making flash costs more attractive), but also reduces the number of physical I/Os that hit media, by filtering out duplicate data as updates are initially written to the array. With highly replicated data (like virtual machines or desktops), savings can be significant.
Flash and erasure coding
What about erasure coding and flash?
At its simplest level, RAID-5/6 is similar to erasure coding and we see that being used by VMware in Virtual SAN for data protection across multiple physical vSphere nodes in a cluster.
However, none of the major storage suppliers use erasure coding with their flash products as a protection mechanism against device failure, with the exception of Pure Storage, which uses an N+2 erasure coding scheme on its latest FlashBlade platform.
The benefits of flash with erasure coding are currently lost due to the fact that current erasure coding deployments (with the exception of Virtual SAN) are designed to provide geo-dispersed protection and this adds a level of latency that negates flash performance.
However, we are starting to see the emergence of very large capacity flash drives (Samsung has a 16TB unit; Seagate has demonstrated a 60TB drive) and, as a result, the scaling limitations of RAID as seen in hard disk drives will start to hit flash.
At this point, suppliers will have to look more seriously at data protection using erasure coding, and we could see some interesting developments in storage resiliency in the coming years as a result.