Does NVMe signal the end of the storage controller?

NVMe could boost flash storage performance, but controller-based storage architectures are a bottleneck. Does hyper-converged infrastructure give a clue to the solution?

Non-Volatile Memory Express (NVMe) promises to revolutionise the storage industry by unleashing the performance benefits of flash, but that potential is currently blocked.

So, from a technical perspective, what needs to change to make sure we’re getting the most out of the NVMe?

NVMe is a storage protocol that aims to resolve some of the performance bottlenecks that arise when faster storage media meet traditional storage protocols, such as SCSI.

Historically, spinning media has formed the foundation for data storage, with SCSI in various forms, such as SAS or Serial Attached SCSI – the basis of data transfer between storage and server. SCSI is also used in external networks such as Fibre Channel and iSCSI.

The protocol was developed in the days when hard drives were relatively slow compared with main memory. Because hard disk drives (HDDs) were slow to respond – relative to main memory speeds – there was no need for a performance-optimised transfer mechanism.

As we move to flash, that has all changed. Performance figures for HDDs with access times measured in milliseconds are now replaced by flash that is typically measured in microseconds.

In addition, as a solid-state medium with no moving parts, flash offers the ability to handle parallel input/output (I/O) much more effectively than a spinning disk ever could. This means NVMe looks set to replace SCSI as the key protocol for storage.

NVMe is optimised to reduce traffic between device and processor and to improve the parallel nature of I/O by having many more queues to each NVMe-connected device. It is also possible to use NVMe over a network via NVMe-over-fabrics (NVMf).

NVMe and the storage controller

NVMe offers massive potential to improve I/O performance, but what effect will it have on the storage controller and the traditional array architecture?

Modern storage arrays were built around a number of problems that needed to be resolved. As a shared resource, external arrays consolidate capacity, reduce maintenance (servicing the array rather than individual servers) and improve availability (via Raid protection and storage networking). The storage array therefore carries out a number of important tasks around data availability and protection.

Suppliers have designed products to ensure data loss is minimised using software and hardware Raid, distributing data across many devices and scaling to multi-petabytes of capacity.

Key to these is a storage controller – often a redundant pair, in fact – that sits in in front of the storage media and handles I/O, sharing and provisioning capacity, dealing with data protection, and data reduction, for example.

As we move towards the possibility of much improved storage performance, a number of issues arise. One of them has always been lurking, and that is back-end scalability.

Arrays have used proprietary hardware or technical systems such as SAS expanders to provide controller access to potentially hundreds of externally connected drives.

When each SAS-connected HDD could manage less than 200 random input/output per second (IOPS), it was possible to connect many devices at the back end of a storage array, to the controller or through expansion ports and additional disk shelves.

Because all data goes through the controller, SAS adapters have been a potential bottleneck to system throughput. This issue was apparent with the first hybrid and all-flash arrays that had limitations on the number of flash drives that could be supported (as a ratio of overall drives).

The next bottlenecks occur with the ability of the controller to run NVMe drives to their full potential. Modern SAS flash drives can support 300,000 to 400,000 IOPS with up to 2GBps throughput, but put a batch of these drives into a controller and the average performance per drive drops significantly. The problem will be even worse for NVMe, where I/O is vastly more optimised.

This is because even a small number of drives can overwhelm the performance of CPUs in the array controller. This can happen even without additional features such as compression and deduplication that create an additional overhead. 

As we move forward to NVMe-enabled systems, users are not going to be happy with data services that introduce additional latency that NVMe was supposed to eliminate.

Fixing the NVMe bottleneck

How can the issues in storage arrays be resolved to make them work more effectively with NVMe?

Software will form a major part in getting the most from NVMe’s capabilities. I/O paths will need to be rewritten to take out inefficiencies that existed but were acceptable in an era of slow storage devices.

These changes will get us part way to using NVMe more effectively, but will still be limited by the I/O path being through one or more controllers. Without the ability to disaggregate, there will still be a storage array bottleneck.

So, the idea of removing the need to channel all data through a central set of controllers could provide the route to fully exploiting NVMe drives.

We can see disaggregation of storage in action already in hyper-converged infrastructure, and storage suppliers are already bringing systems to the market that use this principle.

Excelero, for example, has developed a product that uses RDMA network cards and a high-speed Ethernet network to build out NVMesh, a grid of storage servers that can be used in hyper-converged mode or as a dedicated storage platform. Data transfer can be done between nodes without involving the target node processor, providing for massive scalability.

Datrium has taken a different approach and disaggregated active and inactive data, to deliver low latency I/O from storage local to the compute.

Vexata is another startup bringing disaggregated and parallel storage to the market. Initially the company is selling hardware appliances, but in the long term the focus will be on features in software rather than hardware.

NVMe and the future – no more controllers?

It may well be that the future of storage arrays is not to have controllers at all. Disaggregation promises to resolve the bottlenecks of the controller architecture, but there are lots of challenges here – security, data protection and data optimisation all need to be federated across clients that consume data from the physical storage layer.

The benefits of shared storage may well outweigh the complexities of disaggregation for some time yet. Either way, NVMe is set to radically change storage in an ever-evolving industry.

Read more about NVMe

Read more on Storage management and strategy

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Hey Chris,

Great article, and one that introduces an important question as to the changes in storage architectures as a result of changing one specific element in an environment (ripples in a pond).

NVMe, in and of itself, is a protocol for accessing persistent media. It is true, as you state, that PCIe-based NVMe does not require an adapter on the PCIe bus, and in some cases a storage controller on the PCIe bus can act as a storage controller (in terms of functionality).

NVMe-oF (NVMe over Fabrics), however, is a means of taking the NVMe protocol and placing it over a transport-agnostic network. That means, for instance, that I can run NVMe-oF with RDMA-based networks (such as RoCE, iWARP, and InfiniBand) and Fibre Channel, with TCP-based networks currently being worked on inside of NVM Express.

You'll note that NVMe-oF does not require or preclude the use of a storage controller, nor does NVMe for that matter. Storage controllers - for both SCSI and NVMe - are a design option for the storage architecture. In many cases, especially in the near future, storage controllers will continue to be used because they do more than simply "control" devices, they also provide high availability, write-path optimization, in some cases compression, deduplication, etc. (depending on how broad you want to define what a storage controller is and does).

The bottom line, though, is that the decision to use a controller is entirely dependent upon the end-to-end architecture, which in turn depends upon what you want to do with your storage environments.

In some cases you may have a storage controller set in software that sits above (or beside) a box that has multiple PCIe-connected NVMe drives - but you still need something to do volume management. In some cases you'll have hosts that need to have direct data path connection into a NVMe-oF subsystem and need to offload that volume management to the storage device.

All a controller needs to do is manage the NVMe namespace: whether it exists in an array or on a host or in software somewhere else, it will still need to exist. The features and functionality of that controller may be modified due to the nature of NVMe QPs, but it still needs to be there.

Thanks for bringing this up!

[Note: I am on the Board of Directors for NVM Express, but speak only for myself. Any errors are my own.]