They say that many hands make light work. On the other hand, too many cooks spoil the broth. These two contrary maxims sum up the pros and cons of distributed computing. By putting multiple machines at the disposal of a workload, the combined horsepower should speed up its execution.
But dispersing jobs, or even splitting up just one job, around a network of computers can become so complicated and clumsy that the advantages of multiprocessing dissipate or even evaporate. Optimising throughput in a distributed environment is the key to overall performance.
What users want is a way of harnessing more effectively the considerable amount of horsepower sitting on desktop machines and on their behind-the-scenes servers, by regarding them as a networked cluster capable of sharing work amongst each other in an emulation of parallel processing.
But how? In a distributed network, there needs to be more than just the bog-standard operating system belonging to each individual machine running the show. System software that can view the bigger picture of the network configuration as a whole must be used.
It's a problem that has dogged parallel processing for decades: how to harness the power of more than one processor simultaneously.
There is, of course, no logical difference between a computer built to a parallel-processing architecture with distributed memory - each processing node has its own allocation of memory which it does not share with any other node - and a distributed network, or cluster, of standalone computers.
But the physical difference is significant for two reasons. First, there will be no overall operating system specifically designed for a distributed-memory-architecture parallel processor.
Second, there is the physical remoteness of each node in a networked cluster, even if they are only separated by a desk, which means that the interconnects between nodes are nothing more than local area networks.
The reason Seymour Cray designed his eponymous supercomputer machines to be circular was not to win design awards but to reduce the physical distance between components to a minimum, so nodal interconnect distance and speed to performance are critical.
In practice, a third difference between monolithic parallel processing and desktop clustering is that most users buy boxes from more than one supplier. This means that any operating system that is extended to support multiprocessing throughput on a cluster of networked desktops has to be able to cope with different types of hardware and their associated internal operating systems.
"There are very few single-vendor sites," says Bill McMillan, senior technical consultant at Canadian company Platform Computing, which specialises in high-performance distributed computing.
That means manufacturers such as Hewlett-Packard and Sun "are unlikely to put [support for heterogeneous clustering] into their own operating systems - most are focusing on high-availability, failover clustering", McMillan adds.
For that reason, he says, computer manufacturers using Windows NT and Unix space are happy to let a company like Platform Computing fill the gap with its third-party system software LSF (load-sharing facility). "Most suppliers endorse and resell LSF," he says.
The function of this software, which was developed at Berkeley in the US and modelled on the Cray Network Queuing System, is to take over where the internal operating systems stop, effectively acting as a distributed operating system which can see the entire network resources and organise and control their deployment against the software that needs to be run. This aims to ensure that workloads are balanced out.
LSF modules sit on each machine on both desktops and servers, and are themselves controlled by the central scheduling module of the software (the latter is also capable of running on any of the machines in the cluster - "we have our own internal load balancing," says McMillan.)
Matching the power to the jobs
The task of LSF is to match the computing power of the networked cluster to the multiplicity of jobs that have to be done. It does so knowing the computing power available at any one time, the workload and, crucially, the priority of the components of that workload.
Not unnaturally, in a distributed environment with multiple users, most users like to think that their own tasks are the most important. LSF - and the system administrator - has to take a loftier view. Without a clear overview of system resources and job priorities, users like to select for themselves what they are going to run their jobs on. Naturally they make their choices based on their own logic and perceptions of the appropriate level of computer power.
Equally naturally they will prefer to grab the most powerful machine around to execute their own jobs fastest. But this can result in the most powerful server being used for tasks that actually can be run on a computer with less capacity.
Users make their choices independently of each other but, since no one can know what jobs their colleagues are thinking of running, the fastest machine easily gets overburdened when several users want to run capacity-hungry jobs on it. The jobs in the machine are switched back and forth between the processors and discs as they try to share the available capacity, resulting in all the jobs running more slowly.
Meanwhile the queue of waiting jobs continues to grow. The fastest machine suddenly develops a bottleneck as a result of the lack of co-ordination of needs and resources. When each individual user tries to utilise the available resources optimally for his or her jobs, the system as a whole can easily become sub-optimised, pleasing no one.
With LSF in charge, things are less chaotic. The software controls the job-queuing system against priorities predefined by the system administrator. The software optimises the number of jobs being run on a server to ensure they all run as fast as possible. When the job is finished, LSF ensures that the server immediately starts new jobs and does not sit around idling unproductively.
Queues can be categorised in several ways: jobs with high priority, ordinary jobs or jobs that can be run overnight. It is also possible to set up queues for larger, capacity-hungry jobs and others for shorter, less demanding jobs.
While queues are being constructed and jobs in them peeled off, another LSF module, LSF Base, monitors what is happening across the cluster and gives an overview of use of resources, looking both for potential bottlenecks and periods of idling. LSF can spot when a particular end-user logs off and goes to lunch, and can immediately pass another job to his machine. When the end-user returns and logs back in, the job is moved to another processor and their own machine reverts to them.
"Most people's desktops are used around 15% of the time. A cluster can be used 90% of the time," points out McMillan.
Ironically, however, these days it is not so much the rate of use of the hardware resources that is of most concern to users, but the rate of use of their application software. With makers of chips, cars and even films, environments in which distributed network clusters are popular, high-performance applications such as circuit design or graphics processing are not only processor-intensive but also costly. One of the key drivers for making clusters work without idle capacity is keeping these expensive software applications running as long as possible.
A software licence is a 24x7 payout, so to get the maximum return on investment the software needs to be running as close as possible to 24x7. "We keep the licences busy," says McMillan.
By keeping the cluster highly load-balanced, users not only get more from their licence in terms of each job running at peak speed, but more jobs overall. Software productivity is maximised. This actually makes it easier, argues McMillan, for users to justify buying more software.
"If you only use software 10% of the time it's very difficult to justify buying more, but if you can use it 100% of the time that's a different story."
Case study: MillFilm spreads Gladiator's effects load
MillFilm is the visual effects company that did the post-production work on the newly released Ridley Scott film, Gladiator. The Coliseum and the crowds inside might look like the real thing to the cinema audience, but in fact they are all computer-generated. Nor was Russell Crowe actually on set with live tigers - Crowe and the big cats were shot separately, then melded on computer.
The computing demands for film visual effects are phenomenal. Each second of film requires 24 frames, and each frame requires computer-intensive graphics rendering - putting in naturalistic light and shade - to make it work in the cinema. The more computing power and time spent on rendering, the better quality the finished image. One feature film, says Bill McMillan, senior technical consultant at Canadian company Platform Computing, can clock up 2 terabytes of disc space, and one frame can take between 10 minutes and 20 hours to produce, depending on the complexity.
"When we did the Coliseum it took between 30 minutes and 60 minutes [of computer time] to render one frame," says David Lomax, head of 3D at MillFilm. That's a lot of computation.
MillFilm did it with a lot of computers. "The animators used Silicon Graphics workstations and then the final rendering was done on a render farm, a collection of Silicon Graphics and Linux boxes," says Lomax. "At our peak we were using around 50 processors."
As the deadline for post-productionloomed, MillFilm realised thatitmade sensetosteal cycles from the machines outside the dedicated render farm processors.Platform Computing's load-sharing facility (LSF)was used to manage this extra capacity. "LSF found any machine not doing anything and sent it a [rendering] job," says Lomax.
Without LSF, it would have required manual intervention by the rendering operators to keep an eye open for idling machines, inevitably wasting more machine time and their own time.
"LSF took out the manual effort," says Lomax. "If we can make optimum use of [processor power], we can make better images. If we waste processor time then frame times have to be lower so the image won't look as good as possible. The more processing power the better the image."
Case study: Ericsson keeps them working
Swedish telecoms giant Ericsson uses a distributed Sun environment for the design of its computerised circuits for its Axe exchanges and again, the challenge is to make the most of the horsepower and the software licence costs by dynamically managing the workload going through the network.
The circuits have to be designed by simulating their function on computer, and each simulation can take between a few seconds and a few minutes, sometimes considerably longer. By using LSF, Ericsson can utilise all the servers in the network for simulations, including ordinary desktop systems, when they are not being used for office work. If an end-user wants his desktop back, the simulation has to be moved to another machine, so a few jobs need to be re-run, but overall, using LSF, the collective utilisation of the resources in the cluster increases by more than 50%.
Ericsson's Barbara Remp says, "We can get much shorter turnaround times for computer jobs and thereby there is a considerable increase in the number of jobs we can do in a fixed period."
This was first published in June 2000