Is hyper-converged the answer to the NVMe bottleneck?

NVMe offers huge possibilities for flash storage to work at its full potential, at tens or hundreds of times what is possible now.

But, but it’s early days, and there is no universally-accepted architecture to allow the PCIe-based protocol for flash to be used in shared storage.

Several different contenders are shaping up, however. We’ll take a look at them, but first a recap of NVMe, its benefits and current obstacles.

Presently, most flash-equipped storage products rely on methods based on SCSI to connect storage media. SCSI is a protocol designed in the spinning disk era and built for the speeds of HDDs.

NVMe, by contrast, was written for flash, allows vast increases in the number of I/O queues and the depth of those queues and enables flash to operate at orders of magnitude greater performance.

But NVMe currently is also roadblocked as a shared storage medium.

You can use it to its full potential as add-in flash in the server or storage controller, but when you try to make it work as part of a shared storage setup with a controller, you start to bleed I/O performance.

That’s because – consider the I/O path here from drive to host – the functions of the controller are vital to shared storage. At a basic level the controller is responsible for translating protocols and physical addressing, with the associated tasks of configuration and provisioning of capacity, plus the basics of RAID data protection.

On top of this, most enterprise storage products also provide more advanced functionality such as replication, snapshots, encryption and data reduction.

NVMe can operate at lightning speeds when data passes through un-touched. But, put it in shared storage and attempt to add even basic controller functionality and it all slows down.

Some vendors, for example, Pure in its FlashArray//X, have said to hell with that for now and put NVMe into their arrays with no change to the over all I/O path. They gain something like 3x or 4x over existing flash drives.

So, how is it proposed to overcome the NVMe/controller bottleneck?

On the one hand we can wait for CPU performance to catch up with NVMe’s potential speeds, but that could take some time.

On the other hand, some – Zstor, for example – have decided not to chase controller functionality, with multiple NVMe drives offered as DAS, with NVMf through to hosts.

A different approach has been taken by E8 and Datrium, with processing required for basic storage functionality offloaded to application server CPUs.

Apeiron similarly offloads to the host, but to server HBAs and application functionality.

But elsewhere, controller functionality is seen as highly desirable and ways of providing it seem to be focussing distribution of controller function processing between multiple CPUs.

Kaminario’s CTO Tom O’Neill has IDed the key issue as the inability of storage controllers to scale beyond pairs, or even if they can nominally, to actually become pairs of pairs as they scale. For O’Neill the key to unlocking NVMe will come when vendors can offer scale-out clusters of controllers that can bring enough processing power to bear.

Meanwhile, hyper-converged infrastructure (HCI) products have been built around clusters of scaled-out servers and storage. Exelero has built its NVMesh around this principle, and some kind of convergence with HCI could be a route to providing NVMe with what it needs.

So, with hyper-converged as a rising star of the storage market already, could it come to the rescue for NVMe?