Hybrid flash vs all-flash storage: When is some flash not enough?

When it comes to choosing between hybrid flash and all-flash storage, the question is increasingly not how much flash is enough, but whether you still need any disk at all

Let's start by taking a brief look at the available options. Much has been written in these pages about the growing acceptance and usefulness of flash storage within enterprise-grade arrays. Initially, flash-based solid-state drives (SSDs) were expensive, so the fastest and least costly way to take advantage of them was to retrofit...

them to an existing disk-based array – as a tier zero, for example.

As prices fell, tier one flash also became an option, especially for those with auto-tiering technology that could automatically move the hottest data onto the fastest tier. However, this still left the array optimised for spinning disks – although this is changing as suppliers update their array software – so other developers realised they could instead design an array optimised for flash, with secondary tiers of cheaper spinning disks for longer-term data.

Remember here that flash is not just a faster version of spinning disk – it is a fundamentally different medium. Yes, it can perform the same tasks as a disk drive, but it works differently, so if your array firmware and your applications continue to address it as a disk you will be wasting much of the advantage available to you. For instance, applications may buffer writes to cope with disk latency, but your developers now need not factor that into things.

Lastly, there are the all-flash arrays with no spinning disks at all. Initially these – just like their DRAM-based forerunners from Digital Equipment, Texas Memory Systems and others – targeted the most performance-hungry and latency-sensitive applications where cost was much less of an issue, but as time went by and flash costs fell further, the all-flash approach started to make sense for a much broader spread of enterprise applications.

This last process was greatly assisted by the adoption of denser and therefore cheaper flash technologies such as multi-level cell (MLC) and triple-level cell (TLC), which store two and three bits per cell respectively, and have enabled the creation of capacity and performance-optimised variants of flash. The denser chips are less reliable, but as in several other areas of technology, we can use software to more than compensate for this.

Another factor that made all-flash arrays easier to adopt was that while they initially lacked the sophisticated storage management capabilities of the established disk arrays, this is no longer the case. Some may even exceed the capabilities of disk arrays, especially in areas where flash excels, such as continuous data protection (CDP), guaranteed quality of service and the creation of capacity-free snapshots.

Analyse the workload

So why would you still want a hybrid array, and why would you choose a purpose-designed hybrid over a retrofitted one? To start answering those questions you must analyse the workload. The more random it is – and virtual desktop infrastructure (VDI) is highly random, for example – then the more appropriate all-flash will be.

Customer-facing online transaction systems and databases are also sweetspots for all-flash, as is virtualisation in general. Sequential workloads and those with large proportions of cold data are a different matter. Essentially though, what we used to call cache hits and misses still matter, and with all-flash everything can be a cache hit.

With raw flash capacity costing an average of perhaps $5 per GB, hybrid arrays also tend to be cheaper, but cost comparisons between disk and flash are nuanced. For instance, flash is more compact and less power-hungry, so what you win on the purchase price of a hybrid you might lose on its operating costs and the space it takes up.

Be aware too that while some all-flash suppliers are already claiming price parity, this is hard to assess because of the effects of the data reduction technologies used to get more data into a given volume of flash. These technologies – the primary ones being data deduplication and compression – can reduce the effective price per GB by a factor of 5:1 or more (to $1 per GB or less), but their performance can vary considerably depending on the type of data.

Read more about flash storage

As an aside, this is why any decent flash-enabled array needs to include a variety of data reduction tools. For example, data deduplication will work much better on virtual machine images or some file-based tasks than on databases, while compression will work better on databases than for photos or videos. In addition, some arrays allow you to turn these features off on a per-LUN basis – look out for this if you plan to consolidate multiple applications onto a flash array, as your most latency-sensitive applications may need data reduction turned off.

The advantage of purpose-designed hybrids over retrofits is primarily that the former should accelerate all I/O, not just the data placed on the flash tier. However, this is increasingly true of all hybrid arrays, thanks to improved data management software which automatically promotes the hottest data to flash. Indeed, with some systems you must actively pin data to the flash tier if you want it to stay there regardless of its “temperature”.

Many hybrids use flash as a read-only cache, while others use it as a write cache too, in part to speed up access to slower hard drives. Writes may be cached in RAM until they can be written as an entire flash page, or as sequential input/output (I/O).

Interoperability with existing infrastructure

The question is complicated further by whether or not the new all-flash or hybrid array must interoperate with an existing storage infrastructure. For example, if your current disk-based NetApp filers are coming off maintenance then you could consider going all-flash, otherwise you will want at least some interoperability. Adding flash to create a hybrid may be preferred; alternatively if your filers are clustered, check whether you can add an all-flash unit to the cluster, in effect turning the whole thing into a hybrid.

Scalability and the upgrade path is also important. How granular are the all-flash options and will you end up paying for a bigger system than you need, and will it need a forklift upgrade when it reaches capacity?

In summary, there is little argument now that the first tier should be flash. Major storage suppliers already report that shipments of 15,000rpm disks have almost entirely been replaced by SSDs. The question is what do we choose for the second (and perhaps third and fourth) tiers – disk, or more flash?

Hard disk could be better for some tasks, such as media streaming, and hard disk technology is still evolving, but its properties change as it does so. For example, the latest high-capacity shingled media is good for reading, but difficult to write to and delete from, making it more of an archive medium.

Flash too is a read-optimised technology that presents additional challenges when you need to write to it, such as the requirement to erase used space before rewriting to it. And while flash is currently moving to denser yet more reliable 3D cell structures, current NAND flash technologies may only have a couple of generations left. There are several more non-volatile technologies evolving in the wings though, such as magnetoresistive RAM, ferroelectric RAM and phase-change RAM.

Flash and its successors will therefore migrate to more and more tiers. Disk will still have its uses and advantages, but like tape before it, they will become ever more narrowly defined.

Next Steps

EMC flash storage talked about by EMC engineer 

Read more on Flash storage and solid-state drives (SSDs)