Bigger drives mean the RAID rebuild must become a thing of the past

With 20 TB drives possible by the end of the decade, RAID rebuilds would be unacceptably long. Subdrive and platter-level protection schemes need to be devised to replace them.

A few years ago you could have envisaged a drive shelf with 20 1 TB drives, 20 TB of capacity, and a RAID scheme set up with two parity drives and maybe a hot spare or two as well to make a pretty resilient shelf.

Now, with 4 TB drives appearing and with drive capacity increasing at, say, 33 percent a year, that means a doubling of capacity every three years. Let's work the simple maths: 2012: 4 TB; 2015: 8 TB; 2018: 16 TB; 2019: 21 TB. In 2019 it will be possible, extrapolating current trends, to have a 20 TB drive; the equivalent capacity of an entire 20-drive shelf in 2001 or 2002.

What does this mean? Unless drive reliability increases enormously, we will still need RAID. But a RAID rebuild of a 20 TB drive, assuming it still spins in the region of 7,400 rpm to 15,000 rpm, will take many days. That would be absolutely unacceptable. 

Surely what will happen is that drive failures will stop being all-or-nothing and instead become platter or individual read/write head failures, with the drive kept in service but at reduced capacity. This is what X-IO does now inside its Intelligent Storage Elements (ISE), sealed drive enclosures using sophisticated Seagate hard drive diagnostic software to work around failed areas of a drive's capacity and keep an otherwise-failed drive in service.

If this is done, it would be helpful to increase the platter count and so limit platter failure areas. It might also be worthwhile to add a second stack of read/write heads so that a failure in one stack can be countered by a failover to a second stack or to a particular head in the failover stack.

Having an active second stack would increase disk I/O speed, which would itself help with RAID recovery time and generally increase disk I/O capacity.

With 20 TB in a single drive, the opportunity for caching the hot data in the drive will be much greater, and we can envisage hybrid 20 TB drives with, say, a terabyte or so of flash -- much like Seagate's Momentus XT -- which would speed drive I/O.

There could also be some form of data tiering using the drive's faster and slower tracks, as the Pillar Axiom array used to do, to speed access to hot data and tolerate slower access for nearline or cold data. A 20 TB desktop drive could use both techniques to store a vast amount of data, have nearly all-flash boot and application load times, and deliver hot data faster than cold data.

Could such drives be used in enterprise arrays? The array controller software would have to know about them and their properties in order to take best advantage of them, not employ simple drive-level RAID schemes to cope with failures and be able to take advantage of any platter-level RAID schemes inside such drives.

Storage array designers could switch to 2.5-inch drives, increase spindle counts, boost array capacities and delay the onset of 20 TB drives and the consequent RAID rebuild problem, but only by a few years; it would still eventually happen.

And there is no chance that flash solid-state drives could replace hard disk drives by 2019. There isn't the fab capacity to do so and, with flash fabs costing upwards of a billion dollars, they are not going to get built fast enough to replace hard disk drives for 20 years or more with the present rate of data growth.

Therefore, there is a severe looming problem. As drive capacities increase and head towards 10 TB and beyond, today's drive-level RAID schemes will become unusable because rebuild times will be too long. Drive failures will have to be treated as partial drive failures, and subdrive-level protection schemes will need to be devised and put in place to work around the failed components or recording areas.

We cannot carry on doing what we are doing now. Engineers at Seagate, Toshiba and Western Digital know this. So too do computer scientists in Silicon Valley and the universities. Expect a data protection and integrity rabbit to be pulled out of the technology hat to cope with this -- possibly using erasure coding schemes as seen in object storage technology. 

It will become a problem, and it will become a solved problem; I'm confident of that. X-IO is pointing the way with how it copes with drive failures. What sweet satisfaction it would be for X-IO Chief Technology Officer Steve Sicola and his drive engineers to be proved right all along, even if it is for solving the worsening RAID rebuild time problem and not for providing a better mousetrap to cope with today's drive unreliability.

Chris Mellor is storage editor at The Register.

Read more on SAN, NAS, solid state, RAID