kalafoto - Fotolia

Feature

NVMe startups’ different routes to flash performance gains

NVMe offers huge flash performance benefits, but no one is quite sure how to overcome the challenges that brings and startups are trying several different approaches

Chris Evans

Published: 13 Feb 2019

NVMe (non-volatile memory express) is a new flash storage protocol that eliminates a lot of the overhead of legacy protocols such as SAS and SATA. The result is much lower latency and greater throughput for solid-state disks and storage-class memory.

Native NVMe is connected via the PCIe bus and in the enterprise it is being adopted as the standard for local persistent media. Suppliers are also starting to develop products that allow NVMe to operate over a network, where it is known as NVMe-over-Fabrics or NVMf.

To date, we have seen NVMf implemented on Fibre Channel (FC-NVMe), Ethernet (using RoCE) and InfiniBand. FC-NVMe operates over existing Fibre Channel equipment, albeit at the latest hardware revisions and with new firmware and drivers.

Ethernet requires specific hardware adaptors that support RDMA-over-Ethernet, although NVMe-over-TCP is emerging as a practical way to use NVMe over standard Ethernet-based networks and NICs.

We have looked at the big five storage array makers’ efforts in NVMe, but it is a fertile ground for storage startups, too.

As with any storage technology, the ability to create something that goes faster than existing systems is highly prized. Today’s modern applications in the area of finance and data analytics, to name but two, require scalable low-latency storage.

Startup solutions that can give the customer a business advantage will always be appealing. This aligns with the day-to-day need to reduce costs and improve the performance of existing applications.

NVMe benefits

Of course, to adopt any new technology requires overcoming some of the technical challenges already being experienced in the market.

It would be relatively simple to drop NVMe into existing products and we have already seen suppliers do that.

But NVMe offers much more scope to develop solutions. NVMe drives, for example, offer greater parallelisation of I/O compared with the single-queue provided by SCSI in SAS and SATA drives.

Meanwhile, NVMe using SCM (storage-class memory) such as Intel Optane, provides the ability to byte-address storage rather than access it in blocks.

NVMe challenges

What kind of challenges exist with current storage architectures? The most apparent is the need to channel I/O through shared controllers.

The controller provides the ability to map logical storage assignment (a LUN or file system, for example) onto physical media. It holds the metadata that implements data services such as deduplication, compression and snapshots.

But controllers also add overhead and act as a bottleneck, channelling I/O through a double-ended funnel.

NVMe option 1: Eliminate the controller

If the controller represents a bottleneck, then the answer for some has been to remove it from the equation. When there is no funnel to constrict I/O, servers can access much more of the I/O available from each drive by reading and writing to the drive directly.

We have seen so-called disaggregated architectures that implement this strategy from startups such as Apeiron and E8 Storage.

In these products, the host application server talks directly to the media and bypasses the need for a central set of controllers. Of course, there is a trade-off here because the controller does add value in shared storage.

In the case of E8, metadata is stored on a separate server available to all of the hosts, and a small amount of computing resource is required on each host to manage metadata locally. In the future, this could be replaced by using SmartNICs that offer embedded processing and can offload metadata management to the RDMA NIC.

Apeiron’s ADS1000 storage appliance allows host servers that use custom NICs to talk directly to drives in the 2U chassis. As a result, data services need to be implemented at the host layer, but that results in an overhead only in the range of 2μs to 3μs.

NVMe option 2: Scale out

Another solution is to scale out the components of the storage architecture and eliminate any internal bottleneck. We can see this technique used by startups such as Vexata and Pavilion Data.

Vexata implements an architecture that scales front-end connectivity and back-end storage independently. Front-end I/O controllers (IOCs) connect to back-end enterprise storage modules (ESMs) using an ethernet midplane. Metadata that describes the use of storage media is retained on the ESMs and in DRAM on the IOCs. Processing capacity can be increased by adding more ESMs or IOCs as required.

Pavilion Data has designed hardware that resembles a network switch architecture. A single 4U chassis can accommodate between two and 20 controllers and up to 72 NVMe SSDs. Each controller provides four 100GbE host connections. Performance can be scaled at the front end by adding more controllers and at the back end by adding more drives. Metadata is managed on two redundant management cards.

NVME option 3: Software-defined

All the startup solutions presented so far are based on new hardware designs, but another route being taken by startups such as Excelero and WekaIO is to eliminate the storage hardware and go completely software-defined.

Of course, there has to be some storage hardware somewhere, but the benefit of the software-defined architecture is that solutions can be implemented either as a hyper-converged infrastructure (HCI), dedicated storage or even in public cloud.

Excelero has created a storage solution called NVMesh that implements NVMe-over-Fabrics through a proprietary protocol called RDDA.

Where RDMA connects multiple servers together and provides direct memory access, RDDA takes that step further and makes NVMe storage devices accessible across the network. This is achieved without the processor of the target server and so delivers a highly scalable solution that can be deployed in multiple configurations, including HCI.

WekaIO Matrix is a scale-out file system that is deployed across multiple servers in a cluster. Matrix is capable of scaling to thousands of nodes and supporting billions of files. The Matrix architecture allows direct communication with NVMe media and NICs, bypassing much of the Linux I/O stack.

Applications see what looks like a local file system, although data is distributed and protected across many nodes using a proprietary erasure-coding scheme called DDP (distributed data protection).

Matrix can also be run in public cloud (in Amazon Web Services today) on virtual instances that support local NVMe or SSD storage.

NVMe future developments

The systems we have discussed here use Ethernet/RDMA, InfiniBand or Fibre Channel to network storage and application servers together.

NVMe/TCP is emerging as the next evolution of NVMe-over-Fabrics and should provide the ability to implement high-speed storage with more commodity hardware.

As a result, we may see more startups in this space, as the technology required to build solutions becomes more mainstream.

Next Steps

Work around NVMe storage controller drive limitations

NVMe startups’ different routes to flash performance gains

NVMe offers huge flash performance benefits, but no one is quite sure how to overcome the challenges that brings and startups are trying several different approaches

NVMe benefits

NVMe challenges

Read more on NVMe flash storage

NVMe option 1: Eliminate the controller

NVMe option 2: Scale out

NVME option 3: Software-defined

NVMe future developments

Next Steps

Read more on Computer storage hardware

NVMe over Fabrics (NVMe-oF)

Remote Direct Memory Access (RDMA)

NVMe-over-fabrics: Five need-to-knows

With release of vSphere 7, VMware takes NVMe-oF mainstream