The Gen-Z Consortium was unveiled to the world in October 2016 with the aim of bringing to market a high-performance standard for connecting storage-class memory to processors in a server.
But Gen-Z joins a couple of other nascent standards announced this year – OpenCAPI and CCIX. The former is focused on delivering a standard for connecting hardware (such as GPUs), while the latter is focused on ensuring cache coherency across multiple devices.
Common to all three efforts is the realisation that existing server architectures may struggle to cope with the growing volumes of data that applications need to crunch through, and the need for it to be processed in real time (or as quickly as possible) to deliver maximum value.
All three initiatives are also focused on the development of open standards, perhaps in an effort to avoid a replay of Intel’s dominant position in server technologies.
Interestingly, each consortium counts a number of key industry players among its members, but Intel’s name is notably absent – at least for now.
Gen-Z is perhaps the most interesting of the three, in that it is designed to let servers take full advantage of new “storage-class” memory technologies, such as Intel’s 3D XPoint or Magnetoresistive RAM (MRAM).
These blur the boundaries between memory and storage, typically offering higher access speeds than flash drives (but not as fast as DRAM), while offering higher densities than the DRAM chips that make up main memory.
Support memory semantic operations
Another key feature is they support memory semantic operations, so they can be accessed like DRAM instead of being treated like storage, which reads and writes data wholesale in large blocks.
Existing interfaces for hooking up storage are not designed for this mode of operation, lack the required bandwidth, and would introduce too much latency.
Gen-Z aims to address this with a new interface that can scale to several hundred gigabytes per second, with latency below 100ns. The standard, due for completion by the end of this year, specifies serial links that can be bonded together for more bandwidth, in a similar fashion to lanes in the PCI Express system bus.
Gen-Z is being designed to use Ethernet signalling and encoding, which doesn’t mean it is based on Ethernet, just that it allows the use of existing Ethernet line driver silicon for electrical or optical connections.
Because Gen-Z supports memory semantic operations, it could also be used to connect DRAM to a processor, but it is not intended to replace the processor’s local memory bus, at least not at first.
“There is still a need for highest speed memory to be connected directly to compute, and that’s going to come with DDR5 in the next generation [of servers],” says Kurtis Bowman, Gen-Z Consortium president and director of Server Solutions for Dell EMC. “This [Gen-Z] is a complement that allows for tiering to start to occur with memory like we’ve seen with storage.”
Gen-Z permits memory to be connected directly to a host processor as normal, or to be connected via a switched fabric as part of a larger memory pool shared between many processors. The consortium talks about “rack scale resource pools”, which means storage-class memory could be deployed in a separate enclosure in a datacentre rack, in a similar way to storage arrays, allowing for a larger pool than could be fitted inside a single server.
Under the hood of OpenCAPI
Meanwhile, the OpenCAPI approach holds out the promise of enabling a flexible heterogeneous compute environment in which accelerator hardware such as GPUs and FPGAs can collaborate more effectively on workloads alongside traditional CPUs in a server.
OpenCAPI has evolved from CAPI (Coherent Accelerator Processor Interface), a standard that IBM developed for its current Power8 processor chips, which enabled an accelerator connected via the PCI Express bus to share a memory space with CPU cores on the chip. In effect, the accelerator behaves as if it were an on-chip peer to the CPU cores.
This eliminates much of the need to transfer data backwards and forwards between CPU and accelerator, boosting applications such as analytics that can involve very large datasets.
While CAPI operated as a protocol over the existing PCI Express 3.0 bus, OpenCAPI will be implemented in IBM’s upcoming Power9 chips to operate over a new transport based on Nvidia’s NVLink 2.0 interconnect, which offers a higher speed of 25Gbps per lane. With the OpenCAPI Consortium counting AMD, Dell EMC, Google and HPE among its backers, the standard is likely to see adoption beyond IBM’s Power server systems.
AMD’s involvement is of particular note, as the chipmaker is seeking to re-enter the x86 server market soon with its Zen processor-based Opteron offerings, and the firm already has a lot of experience in getting CPU and GPU cores to work closely together.
In contrast, Intel is approaching the accelerator problem by working to integrate FPGA capability inside its Xeon server chips. The firm’s acquisition of FPGA specialist Altera last year was key to these plans.
CCIX aims for cache coherence
AMD and IBM are both members of the third consortium, CCIX, along with ARM, Broadcom and others. CCIX stands for Cache Coherent Interconnect for Accelerators, which aims to create a standard way to ensure cache coherency when multiple devices are accessing a shared area of memory.
Cache coherency is an issue in multi-processor systems. If one processor changes some shared data in its cache, there needs to be a mechanism to ensure every other cached copy reflects the change. The key part of CCIX is that it aims to do this in a way that is independent of any single processor architecture or instruction set, so customers are not tied to any single supplier’s system.
CCIX is aimed at letting accelerators such as GPUs and FPGAs share system memory access with CPUs, making it similar to OpenCAPI. The consortium has also settled on PCI Express as its chosen physical interface, but hints that the protocol may be used over additional physical interfaces in future.
Read more about processor innovation
- Microsoft claims field programmable gate array chips are now live in Azure datacentres in 15 countries, as it future-proofs its infrastructure for the rise of big data applications.
- Intel and Micron have created a non-volatile memory that they say is 1,000 times faster than the NAND flash technology currently used in many mobile devices.
“Using PCI Express to transport the CCIX coherency protocol eases the implementation of CCIX in processors and accelerators as well as the deployment of CCIX technology in existing servers by leveraging today’s existing hardware and software infrastructure,” the CCIX Consortium states in an FAQ about the technology.
In fact, both CCIX and OpenCAPI protocols could be carried over a Gen-Z connection, according to Bowman.
“Gen-Z is a very nice transport for existing and future coherency protocols, and you can see that in some of the collaboration work we are doing with other new bus standards,” he says.
Naturally, it remains to be seen how successful or widely adopted these new standards will be without Intel’s backing, but Gen-Z appears to bring a genuinely useful and much- needed new capability to the datacentre stack, and could form a key part of the composable infrastructure approach that several industry players are working towards implementing.
Add to this the fact that many of the same big industry names are involved with two or all three initiatives, and you have a concerted push to drive key new server technologies along a more open direction, rather than the industry continuing to be dominated by Intel’s intellectual property.