NVMe gives "shared DAS" as an answer for analytics; but raises questions too

Go back 10 or 20 years and direct-attached disk was the norm. IE, just disk in a server.

It all became a bit unfashionable as the virtualisation revolution hit datacentres. Having siloed disk in servers was inherently inefficient and server virtualisation demanded shared storage to lessen the I/O blender effect.

So, shared storage became the norm for primary and secondary storage for many workloads.

But in recent years, we saw the rise of so-called hyperscale computing. Led by the web giants this saw self-contained nodes of compute and storage aggregated in grid-like fashion.

Unlike enterprise storage arrays these are constructed from commodity components and an entire server/storage node swapped out if faulty, with replication etc handled by the app.

The hyperscale model is aimed at web use cases and in particular the analytics – Hadoop etc – that go with it.

Hyperscale, in turn, could be seen as the inspiration for the wave of hyper-converged combined server and storage products that has risen so quickly in the market of late.

Elsewhere, however, the need for very high performance storage has spawned the apparently somewhat paradoxical direct-attached storage array.

Key to this has been the ascendance of NVMe, the PCIe-based card interconnect that massively boosts I/O performance over the spinning disk-era SAS and SATA to something like matching the potential of flash.

From this vendors have developed NVMe over fabric/network methods that allow flash plus NVMe connectivity over rack-scale distances.

Key vendors here are EMC with its DSSD D5, E8 with its D24, Apeiron, Mangstor, plus Excelio and Pavilion Data Systems.

What these vendors offer is very high performance storage that acts as if it is direct-attached in terms of its low latency and ability to provide large numbers of IOPS.

In terms of headline figures – supply your own pinches of salt – they all claim IOPS in the up to 10 million range and latency of <100μs.

That’s made possible by taking the storage fabric/network out of the I/O path and profiting from the benefits of NVMe.

In some cases vendors are taking the controller out of the data path too to boost performance.

That’s certainly the case with Apeiron – which does put some processing in HBAs in attached servers but leaves a lot to app functionality – and seems to be so with Mangstor.

EMC’s DSSD has dual “control modules” that handle RAID (proprietary “Cubic RAID”) and presumably DSSD’s internal object-based file layout. E8 appears to run some sort of controller for LUN and thin provisioning.

EMC and Mangstor run on proprietary drives while E8 and Apeiron use commodity cards.

A question that occurs to me about this new wave of “shared DAS” is: Does it matter whether the controller is taken out of the equation?

I tend to think that as long as the product can deliver raw IOPS in great numbers then possibly not.

But, we’d have to ask how the storage controller’s functions are being handled. There may be implications.

A storage controller has to handle – at a minimum – protocol handling and I/O. On top of that are LUN provisioning, RAID, thin provisioning, possibly replication, snapshots, data deduplication and compression.

All these vendors have dispensed with the last two of these, and mangstor and Apeiron have ditched most of the rest, Apeiron, for example, offloading much to server HBAs and the app’s own functionality.

So, a key question for potential customers should be over how the system handles controller-type functionality. The more processing that is done over and above the fundamentals has to be done somewhere and potentially hits performance, so is there over-provision of flash capacity to keep performance up while the controller saps it?

Another question is, despite the blistering performance possible with these shared NVMe-based DAS systems, will it be right for leading/bleeding edge analytics environments?

The workloads aimed at – such as Hadoop but also Splunk and Spark – are intensely memory hungry and want their working dataset all in one place. If you’re still having to hit storage – even the fastest “shared” storage around – will it make the grade for these use cases or should you be spending money on more memory (or memory supplement) in the server?