So, from a technical perspective, what needs to change to make sure we’re getting the most out of the NVMe?
NVMe is a storage protocol that aims to resolve some of the performance bottlenecks that arise when faster storage media meet traditional storage protocols, such as SCSI.
Historically, spinning media has formed the foundation for data storage, with SCSI in various forms, such as SAS or Serial Attached SCSI – the basis of data transfer between storage and server. SCSI is also used in external networks such as Fibre Channel and iSCSI.
The protocol was developed in the days when hard drives were relatively slow compared with main memory. Because hard disk drives (HDDs) were slow to respond – relative to main memory speeds – there was no need for a performance-optimised transfer mechanism.
As we move to flash, that has all changed. Performance figures for HDDs with access times measured in milliseconds are now replaced by flash that is typically measured in microseconds.
In addition, as a solid-state medium with no moving parts, flash offers the ability to handle parallel input/output (I/O) much more effectively than a spinning disk ever could. This means NVMe looks set to replace SCSI as the key protocol for storage.
NVMe is optimised to reduce traffic between device and processor and to improve the parallel nature of I/O by having many more queues to each NVMe-connected device. It is also possible to use NVMe over a network via NVMe-over-fabrics (NVMf).
NVMe and the storage controller
NVMe offers massive potential to improve I/O performance, but what effect will it have on the storage controller and the traditional array architecture?
Modern storage arrays were built around a number of problems that needed to be resolved. As a shared resource, external arrays consolidate capacity, reduce maintenance (servicing the array rather than individual servers) and improve availability (via Raid protection and storage networking). The storage array therefore carries out a number of important tasks around data availability and protection.
Suppliers have designed products to ensure data loss is minimised using software and hardware Raid, distributing data across many devices and scaling to multi-petabytes of capacity.
Key to these is a storage controller – often a redundant pair, in fact – that sits in in front of the storage media and handles I/O, sharing and provisioning capacity, dealing with data protection, and data reduction, for example.
As we move towards the possibility of much improved storage performance, a number of issues arise. One of them has always been lurking, and that is back-end scalability.
Arrays have used proprietary hardware or technical systems such as SAS expanders to provide controller access to potentially hundreds of externally connected drives.
When each SAS-connected HDD could manage less than 200 random input/output per second (IOPS), it was possible to connect many devices at the back end of a storage array, to the controller or through expansion ports and additional disk shelves.
Because all data goes through the controller, SAS adapters have been a potential bottleneck to system throughput. This issue was apparent with the first hybrid and all-flash arrays that had limitations on the number of flash drives that could be supported (as a ratio of overall drives).
The next bottlenecks occur with the ability of the controller to run NVMe drives to their full potential. Modern SAS flash drives can support 300,000 to 400,000 IOPS with up to 2GBps throughput, but put a batch of these drives into a controller and the average performance per drive drops significantly. The problem will be even worse for NVMe, where I/O is vastly more optimised.
This is because even a small number of drives can overwhelm the performance of CPUs in the array controller. This can happen even without additional features such as compression and deduplication that create an additional overhead.
As we move forward to NVMe-enabled systems, users are not going to be happy with data services that introduce additional latency that NVMe was supposed to eliminate.
Fixing the NVMe bottleneck
How can the issues in storage arrays be resolved to make them work more effectively with NVMe?
Software will form a major part in getting the most from NVMe’s capabilities. I/O paths will need to be rewritten to take out inefficiencies that existed but were acceptable in an era of slow storage devices.
These changes will get us part way to using NVMe more effectively, but will still be limited by the I/O path being through one or more controllers. Without the ability to disaggregate, there will still be a storage array bottleneck.
So, the idea of removing the need to channel all data through a central set of controllers could provide the route to fully exploiting NVMe drives.
We can see disaggregation of storage in action already in hyper-converged infrastructure, and storage suppliers are already bringing systems to the market that use this principle.
Excelero, for example, has developed a product that uses RDMA network cards and a high-speed Ethernet network to build out NVMesh, a grid of storage servers that can be used in hyper-converged mode or as a dedicated storage platform. Data transfer can be done between nodes without involving the target node processor, providing for massive scalability.
Datrium has taken a different approach and disaggregated active and inactive data, to deliver low latency I/O from storage local to the compute.
Vexata is another startup bringing disaggregated and parallel storage to the market. Initially the company is selling hardware appliances, but in the long term the focus will be on features in software rather than hardware.
NVMe and the future – no more controllers?
It may well be that the future of storage arrays is not to have controllers at all. Disaggregation promises to resolve the bottlenecks of the controller architecture, but there are lots of challenges here – security, data protection and data optimisation all need to be federated across clients that consume data from the physical storage layer.
The benefits of shared storage may well outweigh the complexities of disaggregation for some time yet. Either way, NVMe is set to radically change storage in an ever-evolving industry.
Read more about NVMe
- Flash pioneer CTO says array controllers are built for a different era and must scale-out as clusters to provide the CPU power needed to allow NVMe to realise its potential.
- NVMe offers to unleash performance potential of flash storage that is held back by spinning disk-era SAS and SATA protocols. We run through the key NVMe deployment options.