IT helps in the appliance of science

Science and computing have come together to map the human gene structure. Simon Quicke takes a looks at the role of machines in...

Science and computing have come together to map the human gene structure. Simon Quicke takes a looks at the role of machines in the project

Technology can be found behind most advances in health and medicine and the human genome project is no exception. Tucked away in the environs of Cambridge, scientists and computer programmers have been working to piece together a map of human genes.

The activities of the Sanger Centre, based just outside Cambridge, in mapping the human genome have already received worldwide recognition. In June last year it announced that the first third of the genome sequence had been produced.

The importance of pinpointing where genes are, and increasing knowledge about their behaviour, is crucial to improve the knowledge held by biology and medicine.

With greater information about genes, the building blocks of human beings, there should be more knowledge about what causes some of us to develop diseases.

The centre employs 570 staff and is reliant on technology, which in the main comes through its chosen partner Compaq. The problem the centre faces is being a victim of its own success.

Originally the Sanger Centre's contribution to the human genome project was planned to run until 2005 and produce one-sixth of the total DNA sequence. Other research groups would come up with the rest.

However, this was increased to produce one-third of the DNA sequence by 2003. The technology was so good at its job that it upped the amount of research possible in the next couple of years.

"This has been due to a number of factors including improved laboratory and sequence collection techniques and equipment and faster computer chip speeds enabling more efficient and automated processing of the data," claims Tim Hubbard, head of human genome analysis at the Sanger Centre.

So far the Wellcome Trust, the charity behind the Sanger Centre, has coughed up £80m and there is more money to come over the next five years.

Considering it is hard enough to get highly qualified IT staff the Sanger Centre needs skilled programmers with more than a passing interest in biology.

"Because the human genome project is unique we are able to attract high-calibre informaticians," claims Hubbard.

He argues that the challenge of being involved with research that is grabbing attention worldwide is appealing to technicians.

"We are dealing with quantities of data that are very unusual and facing problems that have never been seen before. Many IT suppliers believe that the work involved in the human genome project presents one of the strongest drivers for new innovations," he adds.

Richard Durbin, deputy director of the Sanger Centre, claims that the research material produced by the technicians at the Sanger Centre will still be used in 1,000 years' time.

"This research will be regarded as the most significant research in the last 100 years," he claims.

The sort of things technology can make possible is staggering and enough has already been done to make Durbin's claim sound reasonable.

There should be more developments of note to come, particularly in cancer treatment. Everyone is hoping for a cure and the activities at the Sanger Centre could enable that to arrive sooner rather than later.

Running alongside the human genome project is a special team looking specifically at cancer. Optimists in that team expect some sort of breakthrough in movements towards eradicating cancer in the next two decades.

The sort of tasks asked of technology at Sanger could only have been delivered in the past by supercomputers. The speed at which advances have been made in chip speeds and high-density disc and memory have delivered greater computing power for less money and with less demand on space.

"It is not just the hardware. Our infomatics staff bring skills to bear to create new, leaner and more efficient software to carry out analyses," claims Hubbard.

Technology developed in-house is being used to address specific areas in the search for the human genome. Software has been designed to speed up the search for single nucleotide polymorphisms (SNPs).

SNPs are useful because they indicate where there are differences between humans. Where there are differences gives an insight into why some people are more prone to disease than others and helps track the gene differences that cause illness.

Two people differ at about one in every 1,000 bases of DNA. These differences occur randomly through the genome sequence and to map where SNPs occur many human genomes have to be compared.

Software has been developed in-house to crunch through datasets and compare sequences rapidly to produce quick SNP results.

After the genome has been sequenced it will not be a case of switching off the lights and closing down the systems; the work will continue and technology will be used in the next stages of research.

Hubbard claims that the sort of things to be researched will include comparisons between genomes, gene identification, structure prediction and gene-gene and protein-protein interactions.

There is great demand for genome data from biologists and in the next three years the majority of the day-to-day work will involve trying to get the daily output from the centre up on the Internet and available to as many people as possible.

The other major task is attempting to refine the process of predicting where genes are in a sequence. It is partly down to educated guesswork and the inaccuracies that are produced are numerous.

"All this means more resources to cope with wider and more complex interplay. Really, the DNA text string is the easy bit," adds Hubbard.

Powering the genome project

The Sanger Centre has more than 1,500 devices

This breaks down into:

350 Compaq Alpha Systems (DS10, DS20, ES40)
440 Node Blast Farm (PC/DS10/DS10L)
700 Alpha processors in total
250 PCs
150-plus network computers
250 NT/Mac ABI collection devices.

As of November 2000 total storage on Raid disc totalled 22Tbytes and the Centre expects that to grow to 50-100Tbytes in the next three years

Read more on Data centre hardware