Before you get locked into the grid of distributed or P2P
computing, don't dismiss the advantages that using a supercomputer
can bring. Computer Weekly finds that the two approaches to massive
number-crunching applications can be complementary
The prospect of gathering unused processing power from various
parts of a network and using it to tackle large computational tasks
has created a lot of excitement recently. If that network is the
Internet, the amount of processing power on offer is theoretically
limitless. When the task is along the lines of finding a cure for
cancer (the aim of a project mounted by Oxford University, Intel
and others), many PC owners donate their spare capacity for
free.
Companies can use the same peer-to-peer (P2P) approach to harness
unused power on their own desktops for number-crunching tasks.
United Devices, one of the companies behind the cancer project,
also sells software for this
purpose. Senior product manager Robert Reynolds says, "Using this
solution instead of buying a supercomputer gives you a big benefit
in terms of scalability, both absolute and incremental."
While the processors in a supercomputer are typically two to three
times faster than those in a PC, they may only number thousands or
tens of thousands, yet United Devices has scaled its Internet-based
service to more than 650,000 devices.
The company claims that this approach can deliver more processing
power than a supercomputer, for perhaps a tenth of the pro rata
cost. And because processing does not depend on the performance of
a single core system, any number of processors can be added. The
resources are automatically upgraded as companies replace their
desktop hardware, so your "virtual supercomputer" should grow in
line with Moore's Law.
Why aren't all large computational tasks being tackled this way?
Current projects - such as GeneProt's identification of proteins
for therapeutic exploitation - are still adopting approaches which,
at first sight, are closer to traditional supercomputing. It is an
interesting choice given the thematic similarities between
GeneProt's work and the Oxford University cancer project.
There is agreement in both P2P and supercomputer camps that the P2P
approach works for some situations but not others.
Dominique Gillot, Compaq's life science manager for Europe, says,
"Some applications can easily be split so that they will run on
1,000 or more central processing units. They have to be written in
such a way that it doesn't matter too much exactly when the
computer to which a particular task has been allocated responds.
It's like distributed batch processing: it only works if you don't
need the answer to one task before you begin on the next."
Horacio Zambrano, another senior product manager at United Devices,
says that whether a task is suitable for decentralised processing
depends, among other things, on weighing up the overhead of
co-ordination against scalability.
"The cancer application requires very little data - the computation
to communication ratio is high - and so it's highly suitable,"
Zambrano explains.
An application that requires constant communication between
processes running on different processors would be less suitable
because of the time taken for information to get from one node to
another. "Data parallel" is the name that United Devices uses for
the class of applications suited to a distributed, decentralised
approach.
Confidentiality of data can be an additional constraint on this
architecture, particularly where the data is to be sent over the
Internet. However, as Gillot points out, when the data is split
down into so many small chunks, the likelihood of being able to
intercept and piece it together is small.
There is little direct competition between P2P and centralised
approaches. In fact, a closer look reveals that most supercomputers
have less in common with the heroic supercomputers of old and more
in common with the distributed approaches, than appears at first
sight. Both rely on vast numbers of standard processors working in
parallel to tackle large tasks. Gillot makes the point that the
hardware architecture adopted by GeneProt does not even have to be
physically centralised.
"Applications like this can be on one site or distributed over two
or more locations. As for storage, the storage area network should
be viewed as a service and it can be provided from anywhere, as
long as it remains secure and accessible. You need to build
multiple access paths so that if one goes down you can use another
route," says Gillot. He mentions Compaq's collaboration on the
Oracle Parallel Server as being designed to facilitate that type of
distribution.
In the case of GeneProt, Gillot points out that while the data is
held on site along with about 50 mass spectrometers that are
searching for proteins, the processor farm is hosted at a Compaq
building, with a fast fibre link between them. The decision to host
the processors there was made on the grounds of space and skills.
The data, on the other hand, is the firm's stock-in-trade. In
future, it would be possible for a company like this to divide the
processing task between several sites in different parts of the
world, although security considerations may in practice limit the
extent to which such a company would want to distribute its
data.
Like P2P, then, this type of supercomputer approach relies on
grouping together large of numbers of standard components - not
necessarily physically all together - working to provide
supercomputing power.
The ultimate in this style of commoditised supercomputing is, for
many techies, Beowulf architecture: PC clusters running Linux. But
Mark Parsons, commercial manager at the Edinburgh Parallel
Computing Centre, says, "Although zealots claim Beowulf can solve
anything, there's still a need for large specialised machines
because of the fundamental problem of getting the processors to
speak to one another quickly enough." While there are ways to speed
this communication up, their expense tends to cancel out the price
advantage.
This problem of connectivity between system elements is the main
reason why specialist supercomputers are still around. Companies
such as Silicon Graphics and Cray still feature prominently on the
list of the world's top supercomputers compiled by the universities
of Mannheim and Tennessee, while IBM develops specialist
supercomputers alongside its cluster-based solutions. Still number
one on the June 2001 list, was a supercomputer called ASCI White,
which was built by IBM for the US Accelerated Strategic Computing
Initiative. It is designed for use in nuclear testing simulation
and is capable of 12 trillion calculations per second.
But even solutions like this rely heavily on components that are
used in lesser computers - RS/6000 processors in the case of ASCI
White. Ulla Thiel, who leads IBM's scientific and computing sector,
says, "The tendency is to base the solution on standard parts and
processors, using specialised software and hardware to provide the
fast interconnects required."
IBM, responsible for 40% of the top 500 sites on the June list, is
backing more than one supercomputing horse. While most of IBM's
entries on the list are SP systems - clusters of symmetric
multiprocessor nodes - the company also has the biggest Linux
cluster, which is at the University of New Mexico and is number 102
on the list.
Thiel comments, "You need courage to put together this system and,
at the moment, they're found largely in research establishments
rather than in commercial situations where you need high
reliability. However, development goes on: we're porting all our
work on SP systems to Linux clusters."
It's likely, then, that the future of supercomputing lies with
standard machines linked together in increasingly clever ways.
(Last November's Top 500 illustrated this trend. It included 28
networks of workstations, whereas six months earlier there had been
just 11.) For example, the problems of bandwidth and latency that
limit the speed of interaction when system elements are linked via
Ethernet can already be mitigated by using specialised connectivity
solutions, such as Myrinet or Quadrics.
"Quadrics Supercomputer World has networking technology that can
speed things up by a factor of three or four compared with
conventional approaches," reports Gillot.
The centralised and decentralised models are likely to continue to
co-exist in future. Zambrano says, "Many companies we're talking to
already have supercomputers or high-performance computers. Rather
than competing, we complement them by enabling them to take
advantage of processing power on their desks." Because
supercomputers are relatively expensive, it makes sense to focus
them on the tasks for which they are needed.
Looking ahead, the problem of sourcing computational power could be
solved in a more general way, making the central versus distributed
distinction obsolete. The notion of a grid for computing - like the
national grid for electri-city - is attracting attention in
academic and commercial worlds: both Parsons and Thiel are
interested in different grid projects. With this approach,
computational power could be gathered over a network such as the
Internet and delivered to those who need it. In future, we could
have supercomputing on tap.
Has supercomputing had its day?
Are these heavy-duty
computations something that the commercial world needs to worry
about? The answer is yes. At the moment, they tend to be associated
with areas such as defence and bioinformatics: the application of
computing to life sciences and to the demands of the post-genomic
era. But Ulla Thiel, head of IBM's scientific and computing sector,
points out that commercial systems feature prominently on the list
of the world's top 500 supercomputers compiled by the universities
of Mannheim and Tennessee, with broker Charles Schwab at number 20.
Compaq, too, has its sights set firmly on the commercial market.
"These techniques may have been developed for technical
applications, but they're very applicable to tasks such as data
mining and customer relationship management," says Compaq's life
sciences manager Dominique Gillot. That is one reason Compaq is
working with Oracle. "I don't need to explain how much speed you
can gain by parallelising database interrogation," he adds.
Computation power: three different approaches
Peer-to-peer
For:
- Economical, uses resources that would otherwise be idle
- Could be extended to use memory and disc space within other
devices such as printers
- Resources automatically upgraded as users replace desktop
hardware
Against:
- Does not work where there is a large amount of dependency
between tasks or when processors working on different aspects of
the task need to communicate in real-time
- Has to wait for resources to be available
- Without proper prioritisation, impact on network can damage
more critical tasks
Supercomputer built from standard components
For:
- Cheap and easy to maintain compared with specialised
supercomputer
- Faster than P2P
- Can be dedicated to one task
- Can be upgraded
Against:- Connections between processors may be slow compared with
specialised supercomputer (or else specialised connectivity can
push price up)
Specialised supercomputer
For:
- Can be optimised to a specific computational task
- Fastest
Against:
- Expensive - few companies have the resources to develop or buy
solutions that could just be used for a single application
- Can quickly become obsolete.
Usage trends from the list of the world's top 500
supercomputers
- The US keeps its prime position as supercomputer user and
producer with only little changes in geographical distribution
- The number of machines used in industry decreased slightly from
245 to 236
- The number of machines used in research remained stable at 118
(from 119)
- The number of machines used in academia continues to recover to
92 from 86
- IBM dominates the top 500. It is leading the list with respect
to the number of systems installed and the installed performance
with a stable share of 40% and 42%
- Sun is second in the number of systems with 81 (16.2%) and
fourth in performance with 8.6%
- With respect to installed performance Cray retains second
position with 13.1% of performance. (Fourth in systems with
9%)
- SGI is third with respect to systems with 63 systems (12.6%).
(Third in performance 10.2%)
- All vector-based systems are of Japanese origin.
Case Study: GeneProt commissions the world's first large-scale
supercomputing facility for medical research
In April a
Swiss-based start-up called GeneProt announced it was opening the
world's first large-scale centre dedicated to proteomics discovery.
Proteomics is the identification and study of proteins that might
be used as the basis for the development of drugs, or as "markers"
that can diagnose or prevent disease. The company defines the term
"proteome" as "the total protein profile of a fluid or a tissue at
a given time, which can vary during the maturation of cell types
and tissues and the progression or treatment of a disease". It will
obtain proteomic data by mass spectrometry analysis of the tissues
of healthy and sick people.
Processing this information to identify promising avenues for
development and to explore these avenues requires a lot of
computing power. Compaq has built GeneProt a supercomputing
facility based on 1,420 Alphaserver processors running Tru64 Unix.
Storage is handled by Storageworks and it has been estimated that
the initial storage requirements of 24 terabytes will double every
six to eight months.
GeneProt cited Compaq's record in the genomics market and its
ability to install the facility quickly as reasons for choosing
that particular supplier. The enterprise has also received an
equity investment from Compaq, part of a $100m (£71m) Genomics
investment programme launched by Compaq last autumn.
Processing speed is essential to the company's business ambitions.
Managing director Denis Hochstrasser, says, "Some companies
entering the proteomics industry say they'll be able to provide
candidates for clinical testing within a few years. We believe we
will deliver potential therapeutic agents within six months."
Further research
www.epcc.ed.ac.uk -
Edinburgh Parallel Computing Centre: research into high-performance
computing, with more about grid computing
www.geneprot.com -
Geneprot's mission and technology
www.llnl.gov/asci -
about the ASCI project and ASCI White
www.tc.cornell.edu/services/edu/topics/glossary/index.asp
- glossary of supercomputing terminology
www.top500.org - the top 500
supercomputer sites
www.ud.com - includes details of
the Intel/United Devices project to research cancer cures.