Making more of Moore's Law

Keeping to the rule that processing power will double every 18 months is a tough challenge, but developers have a few tricks up...

Keeping to the rule that processing power will double every 18 months is a tough challenge, but developers have a few tricks up their sleeves in the battle to create ever more powerful computers, as Danny Bradbury explains

It is funny how some quotes in history preshadow others. In 1939, Sonya Levien, screenwriter of The Hunchback of Notre Dame, wrote, "All small things overmaster the great." Nearly 70 years later, everyone in the computing industry is trying to prove her wrong.

It is the microscopic transistors on microchips that constantly threaten our ability to grow. In 1965, Gordon Moore argued that the number of transistors you could fit on a chip for an optimal cost would double every year, later broadening this to 18 months. Moore, who would later co-found Intel, was as surprised as anyone when his statement became an industry doctrine. Why? Because the economics of the computer industry mean that its growth has historically depended on processor muscle power, which in turn relies at least in part on transistor count.

You cannot sell more powerful servers - or the increasingly complex software that runs on them - without faster, more functional chips. Moore's Law will have to fail as silicon matures to the point where innovation stops delivering enhancements in transistor technology. The challenge for today's companies is to postpone that day for as long as possible.

"The single most important problem is power density," says Bijan Davari, vice-president of next generation computing systems and technology at IBM's Thomas J Watson Research Centre. Companies fit more transistors on a chip by reducing the die size -the template used to burn transistors ontothe silicon. Smaller die sizes concentrate more power, and therefore more heat, into a smaller area.

With current levels of innovation, the problem is becoming critical. Die sizes of 130 nanometres are common today, but Intel moved to 90nm fabrication in March. "We have outlined the roadmap through 65, 50 and even 30nm," says Intel's European server manager, Alan Priestley. The industry has to innovate and this is happening in three areas: processor design and fabrication, server hardware, and software.

"At the semiconductor level it is the introduction of new materials and structures," says Davari. "It is things such as silicon on insulator - by which you can get performance without significantly increasing power dissipation."

Silicon on insulator insulates the transistor from the underlying silicon to reduce the electrical charge capacity of the transistor. This means that transistors can turn on and off more quickly, improving performance. Other technologies such as strained silicon put the silicon onto another substrate material with atoms spaced further apart. This causes the silicon's atoms to move further apart, "straining" it and reducing the electrical resistance.

Despite innovation in materials, companies cannot keep increasing the power through the processor without running into trouble, says Priestley. The higher the clock rate of the chip, the worse the dissipation problem will be.Instead, chip suppliers are resorting to other technologies to improve performance. One of Intel's secret weapons in the war against chip heat is hyperthreading.

This technology, which first shipped two years ago with the Xeon family before moving to the Pentium 4, makes a single CPU behave like two processors by adding a second set of registers. The result is a single chip that acts like two without using as much power.

Next year, Intel will turn the two virtual chips into two physical processing cores on a single chip, ushering in mass-market multi-core processors. The goal is to increaseperformance without significantly raising power consumption. The next Itanium processor, dubbed Montecito, will be the company's first dual-core Itanium processor and the company is primed to release dual-core Xeons and Pentiums next year. Sun Microsystems is alsodeveloping dual-core technology, but hasalready shipped the dual-core Power4 chip.

In readying multi-core processors, chip suppliers are mimicking on a single piece of silicon what server suppliers have been doing for years. High-performance server configurations have always been about getting multiple systems working in parallel to compensate for limitations in processor technology.

Clustering technology involves linking server boxes together. Traditionally it has been used for failover purposes in commercial applications, says Gartner's research vice-president Carl Claunch. "If the primary system was not able to continue, one box could take over for the other unit," he says. The other big application for clustering was load balancing, making it ideal for distributing a workload evenly between web servers.

Although companies with very high-performance applications, such as engineeringdesign simulators, do use clusters of machines for processing single tasks, most single-image processing in the commercial sector falls to symmetric multiprocessing machines. Stringing processors together in a single box to work on the same program makes SMP useful for everything from high-throughput transactional databases to analytical processing.

But SMP machines are limited in theirscalability because they all access the same memory. This places limitations on the system bus. Better to use proprietary massively parallel processing - although expensive - or non-uniform memory access (Numa) boxes.

Numa machines connect together groups of chips, generally Intel processors on four-way boards. Where necessary, the chipsaccess the shared memory store in the same way traditional SMP machines do. But when possible, they access a different area ofmemory located on the local board. This takes the pressure away from the system bus and makes it possible to scale up to 32 or even 64 processors in a box.

Unfortunately Numa is not without its own disadvantages. Whereas SMP boxes cangenerally run server applications out of the box, software often has to be tweaked for Numa. "Numa architectures were not the same," says Claunch. "For some, the penaltyof going to distant memory is really big, so if you did not carefully tune the application it would bog down. That was early Numa, but newer Numa machines do not impose such harsh penalties."

Now the world is about to turn upside down again. Two years ago, the first blade servers began arriving on the market. Blades are single low-footprint servers with no power supply. They plug into a chassis which holds multiple blades, and the chassis handles the power and other back-end cabling.

"They address Moore's Law in the sense that they are about density," says Gary Owen, head of enterprise products at blade supplier Fujitsu-Siemens. You could pack more computing power into a small space with a blade system, but only with the second generation did we start to see multi-processor blades running higher-end processors such as Xeons.

But blades have their drawbacks too, says Chris Ingle, group consultant at IDC. "They have a cut-off point below which they are not economical at all over rack mount, because you have to pay for the chassis.

"You also have to take a supplier-independent approach," says Ingle. This is because there is no standard for the high-speed backplanes in the blade chassis that connects the blades together; if you buy a chassis from a supplier, you have to use that supplier's blades.

Finally, blades are often standalone - there is no way to get them all working on a single task, and the best that can be done is to use a collection of blades as a load balancing cluster. The key to change really lies above processors and server hardware in the computing stack, at the software layer.

Two concepts promise to revolutionise Moore's Law at the software level: virtualisation and grid computing. As they develop, they are merging to become the same thing. Virtualisation was a reaction to the development of complex distributed architectures which replaced mainframe environments. Distributed systems are inefficient because servers often lie idle and other servers work overtime to complete tasks. System administrators can manually reconfigure systems to a certain extent, but it is more efficient to do it in software, says Claunch, who says that we are at a transition point.

Such transition points occur when two statistical curves overlap. In this case the first is Moore's Law, which forces the cost of hardware down, and the second is the cost ofhuman labour, which has been going up at an inflationary rate, he says. "So when you get to a point where it is cheaper to automate than it is to employ human labour, a transition occurs." In short, why pay system administrators to shift workloads around your infrastructure to use up spare clock cycles, when you can get a piece of software to do it for you?

Virtualisation will be a critical technology development for the uptake of blade servers, says Claunch. Meanwhile, the move towards grid computing continues. Unlike virtualisation, which focuses on increasing efficiency, grid computing tries to turn your distributed architecture into a single computer to enhance performance when dealing with a single task. Companies such as Platform Computing sell software that uses corporate desktops to assist server applications in number crunching.

Oracle's 10g database offers what the company calls commercial grid technology but which is closer to a virtualised system. Administrators can use the system to predefinecollections of servers which can run application modules to meet fluctuating demand, says Oracle's Alan Hartwell, vice-president for marketing for the UK and Ireland.

Moore's Law is unlikely to fail any time soon, given the amount that processor suppliers have invested in miniaturising processor technologies. Nevertheless, it will not hurt companies to consider ways of enhancing their processor usage at both the hardware and software layers of the stack. Perhaps this way, we can avoid Sonya Levien's prediction for a while longer while maintaining Moore's own for some time to come.

Data processing in a Trice    

If you are not getting the most out of your single processor box and do not want to pay for an SMP machine, consider a data processing appliance.  

UK start-up Trice is developing processors sold as ISA cards or boxes connected via USB to your server. The chips are electronic programmable logic devices, meaning that they can be reprogrammed every 10 seconds with new logic.  

Trice is combining them with its radiance data filtering algorithm - originally used for star field analysis in space exploration - to process vast amounts of data. A server can hand-off an SQL query to the processor, which crunches the data using a database of up to 500Mbytes before handing back a result set. 

The company will release thenext generation of its technology, called Avalanche, towards the beginning of next year. "With Avalanche we could search the fingerprints of everyone in the UK - all 10 fingers - in four seconds," says Trice chief executive Sean Colson. 

At the moment the company is selling the technology to "sensitive US clients" - think governmental agencies. Commercial customers can now buy Trice's current generation card, called the Iceboard, but the library of search algorithms that it has written for the device will need some tweaking. Trice hopes to sell a commercially packaged version next year if business funding allows.   

A history of Moore's Law

Gordon Moore first created his famous law when working for Fairchild Semiconductor in 1965, three years before co-founding Intel. 

"The complexity for minimum component cost has increased at roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not increase," he said, referring to the optimal cost of squeezing as many components into a circuit as possible.  

As time went on, interpretations of the law became wider. The time period for doubling transistor complexity became 18 months, or two years, depending on who was interpreting the law, and it was also applied to processor power in general, along with storage capacity.

Read more on Server hardware