When EMC announced the Virtual Matrix Architecture last week, it drew attention to an aspect of system architecture that usually passes without notice: the backplane bus.
While standardised backplanes have been around for more than 20 years, they have often failed to get traction in the world of enterprise computing. The announcement that its VMA would use the RapidIO backplane interconnect standard puts EMC among a smallish number of companies who can envisage a world in which their equipment might be expected to break out of the proprietary island in which most of the IT world still lives.
The ubiquity of Ethernet, TCP/IP, USB, and the PCI bus gives an impression of standardisation that is, in some ways, misleading. In fact, most enterprise IT lives in islands, little soap bubbles of technology that only interact with each other over Ethernet (or, in the case of storage, FibreChannel).
A look inside a data centre demonstrates the remaining isolation of systems. Blade servers live on vendor-specific backplanes, as do storage systems, switches, and routers. At the same time, vendors and users look for faster interconnects so that the links between storage, servers, switches and routers don't themselves become the bottlenecks limiting system performance.
One way to overcome those bottlenecks would be to break down the barriers between systems – to get them talking directly over their backplanes, instead of seeking higher-performance system interconnection over the LAN or the SAN.
This would, however, need vendors to adopt two things: standards for the physical design of the backplane, and standardised protocols for talking over that physical backplane.
It's this second requirement, the standardisation of the protocols, that RapidIO primarily addresses.
Thomas Cox, executive director of the RapidIO Trade Association, explains.
“The principle that we built RapidIO on came from our experiences with buses, and from trying to apply that experience to switched buses.”
Traditional buses focus solely on the physical connection, he explained. “The control signals are all physical – the type of operation, whether it's a read or a write; acknowledgments between devices; devices identifying themselves – traditionally, that's all done with electrical signals.”
To cope with such high demands, designers of physical buses have traditionally solved the problem by over-provisioning the bus. Cox says this isn't a good use of resources, when so many protocols offer reliable ways to switch signals between their source and destination.
“So our major goal was to get the highest possible signalling, using the fewest number pins.” Ethernet-style signalling provided the example that RapidIO drew on, he said.
It's also designed to simplify life for silicon designers, Cox explained. With a standard logical layer able to use different physical layers, chips can be designed to use RapidIO in suitable applications – but the same chips can use the same pins to use other buses if required. This flows onto system designers, since they can use RapidIO without changing their choice of silicon.
“The philosophy is to use common physical layers instead of inventing new ones – the silicon vendor can implement different physical layers on-chip, and let the user select what the different physical pins will be used for, whether it's PCI Express, RapidIO or Ethernet.”
Speed and Reliability
The aims of RapidIO sound familiar: it's designed to provide high-speed communications between components on a bus, with 10 Gbps current capacity and enhancements planned; high reliability (a critical requirement on the inside of any system); and, importantly, traffic management so that systems built on the standard can give priority to traffic that needs it.
“Because we were focussed on embedded systems, we wanted to maintain low latency and guaranteed delivery.
“The switches don't have to pick apart the whole protocol, they just pass the messages on, which gives us low latency. And they don't have to disassemble and reassemble the packets.”
As a result, he said, switches within the bus can be extremely simple – and that leads to low latency.
Replacing the unswitched, physical pin-to-pin bus architecture with a logical architecture also enables traffic prioritisation within the switch, using standard mechanisms.
“Within the fabric, priority is based on applications,” Cox said.
“In the past, people would just overprovision the architecture so that there would never be a collision. But that's not a wise way to use the bandwidth – it's better to create flow control and service classes to ensure the quality of service.
“In RapidIO, everything is virtualised. You don't have to care about how many [bus] switches the traffic passes through, because that virtualisation allows traffic to be re-routed around congested areas.”
The RapidIO monitoring system can watch for congestion at any point, Cox explained, and reroute traffic to ensure its delivery.
RapidIO is designed to facilitate communications on a board-to-board basis or a chip-to-chip basis. As a result, it's not designed for distance: where LANs talk in tens to hundreds of metres, backplane standards look at how to communicate over distances of around 80 to 100 centimetres.
That's one of the reasons it's easier to get very high speed, since distance is the enemy of speed. Within systems, the communications channels will be on the copper on the circuit board (or, in some cases, in-box fibre optics) – and although this isn't part of RapidIO's brief, the principles of designing circuit boards to minimise on-track noise are well-known.
The components for which RapidIO lays down communications protocols will also be more familiar to people how know how the inside of a PC works than to those who look at LANs and WANs. RapidIO is looking at how to communicate between system components such as the CPU, shared memory, and I/O boards.
The specification centres around a common transport – the logical bus, so to speak – that communicates with systems defined by:
- Logical specifications – defining how I/O, global shared memory, messaging systems, flow control and data streaming communicate with the common transport; and
- Physical specifications – the physical layer specifications that are currently supported. These include two parallel bus specificacotions (eight bits wide and 16 bits wide), serial buses (up to four of which can be bonded together to attain the 10 Gbps mentioned earlier), and with flexibility to be expanded for future physical layers.
As with modern LAN, WAN and Internet specifications, the separation between the lower layers (how you design the electrical circuits that communicate) from the logical specifications is significant.
In their early days, backplane bus systems served essentially as “pin extenders”. The buses served to extend the physical reach of the CPU's pinout. For example, today's elder statesman of bus technologies, VME Bus, started out providing pin extension for Motorola's 68000 series semiconductors.
Modern bus systems like RapidIO, by separating the logical from the physical (and by putting the bus, rather than the CPU, at the centre of the picture), relieve the load on the “master” CPU. But there are other, more important advantages.
Life is made easier for system designers, since they need to learn a protocol but don't need to learn the intimate operation of particular CPUs. Moreover, since the bus is “untied” from a particular CPU, it's easier for the bus to take on a multivendor role.
These kinds of advantages have long been recognised in the target markets. But this brings us back to the significance of EMC's adoption of the standard: it's a sign that RapidIO may be able to do something rare among other bus systems, and break into the enterprise systems space.
The Adoption Curve
Such adoption depends on a variety of variables, but most of all, it will depend on enough vendors, of the right kind, see advantages in RapidIO that outweigh the lure of the proprietary system.
“We would like to see widespread adoption,” Cox said, “but it would take a lot of vendors working together to create that critical mass.
“At some point, we believed the storage community would start to take notice, but unless there's an interesting product to talk about, it's hard to get peoples' attention.”
There are, however, reasons that in a post-crash world, adoption of standard buses could gain momentum.
“In the world of telecommunications, people were building systems on proprietary internal buses. So if they were designing a new system, they would have engineers designing a proprietary backplane to support the new system.
“But the industry doesn't have those engineers any more, and that's why standard architectures are coming into play.”
The accelerating mobile market makes the issue acute, Cox said. The telco vendors find themselves short-handed in engineering at a time when mobile base stations are expected to move quickly from 2.5G to 3G to 4G and beyond – and a standardised bus can relieve much of the development effort needed to roll out a new system.
Cox is hoping that similar arguments could gain currency among enterprise system vendors, and believes this argument is at least partly behind the EMC decision to adopt RapidIO.
Reliability is another key part of the system, and is worthy of a little examination. Reliability characteristics cited by RapidIO's designers include:
- Redundancy – with support for a variety of schemes for hot-sparing;
- Hot-swap support;
- Fault detection – covering both communications, with CRC and 8B/10B encoding, and at the system level, with monitors designed to detect system degradation;
- Physical layer handshaking to provide fault isolation; and
- Support in the system's internal routing for automatic fault isolation.
It may once have been argued that “carrier grade” reliability is in excess of enterprise requirements – but that's no longer true today. And here is another reason that vendors might find a neutral, standard bus worth looking at.
It may well be, as various vendors will claim, that their products have five-nines reliability. The problem is that in the data centre, as I've noted before, systems multiply – and every new, disparate system reduces system reliability overall. As systems multiple, the overall reliability of the data centre will fall.
If a whole, encapsulated environment – say, a large-scale database environment with a number of blade servers, storage management, the storage itself, and the networking components – can be built as a single, five-nines rack, it will have the long-term advantage over a data centre that's a patchwork of systems.
Time will tell whether those kinds of advantages suffice to drag enterprise vendors into the world of standard backplanes.