In less than 10 years IT has gone from being a valuable
tool for life sciences to being at the heart of some of the most
important research projects ever undertaken.
The mapping of the human genome along with other areas of the
life sciences have created frighteningly big and fast expanding
repositories of biological data. Scientists expect to find within
these oceans of information the answers to many of our most
pressing problems, from cancer to mental illness and ageing. But
with the pace of discovery far outpacing
Moore's Law,
IT has its work cut out.
The mapping of the human genome in 2003 created a mass of data
which would create a print book stack 17 times higher than
Everest.
It was a feat on par with Newton's Principia, the discovery of
the double helix, and even the moon landing for what it meant to
mankind. And it would not have been possible without the most
sophisticated hardware and software in existence.
Gene database
Professor Janet Thornton - head of the
European Bioinformatics
Institute (EBI), a part of the European Molecular Biology
Laboratory - presides over some of the world's most extensive
databases of genes, genomes, proteins, nucleotides and other
matter. EBI is one of the world's leading institutes for the
application of information technology to biology.
Its European Nucleotide Archive, a repository for public
nucleotide sequence data holds 3.47terabases (3.47 x 10 power 12
bases) of sequence, translating to 106.9TB (terabytes) in storage.
The next release of UniProt, a collaborative protein information
database, will contain information on nearly 8 million proteins.
Currently the EBI's stores of biological data amount to 4petabytes,
which is one quadrillion bytes or 1024TB.
It is reasonable to hope that sitting within EBI and other's
bulging repositories lie the answers to many of our most pressing
problems, such as Alzheimer's, cancer, the reasons for
intelligence, causes of mental illness and ageing, to name but a
few.
But the task of finding them now depends more than ever on the
quality of innovations emerging from the IT sector.
Leading IT companies including IBM, Microsoft, EMC, Sun
Microsystems, Oracle and others have been increasing their
investment in life sciences in anticipation of strong market growth
in the next few years.
Oracle, for instance, offers a suite of applications for
pharmaceutical and medical research groups and boasts the top 20
organisations in both areas as its customers.
Similarly, IBM has seen its life sciences business expand
significantly over the past ten years.
Driving the market is the fact that as more and more biological
information is collected, more computing power is needed to go
through it all and check for possible applications in disease
treatment and health.
Take the
Human Genome
Project (HGP) for instance. The map of human DNA is one thing,
but it is quite another to test the reactions of genes to drugs,
and a virtually infinite sea of biological possibilities which
might represent a cure for any given disease.
"There are fewer than 25,000 human genes - but try to do
combinatorial studies between them and it starts to get quite
mindboggling," says Thornton.
"To say that it is a mountain to climb is an understatement; the
current flood of data easily outpaces Moore's law."
UK-based Titian services several of the world's largest
pharmaceutical companies with systems to help in the management and
rapid retrieval of biological samples, in most cases running into
the millions.
"There are the chemicals which could be small molecules,
synthesis compounds, or natural products and antibodies - all items
the company potentially expects to be their drug compounds,"
explains CEO Richard Fry.
"It is a frozen treasure trove that could be the next
billion-dollar drug."
High-throughput screening
When Darwin was looking for links between the species, he was
able to use only what he could see in front of him. Now, thanks to
high-throughput screening (HTS), researchers can see what is
related to what in incredible detail.
"It just does not work anymore to have large numbers of people
doing things manually," Fry says.
HTS allows a researcher to quickly conduct millions of
biochemical, genetic or pharmacological tests. Through this process
one can rapidly identify active compounds, antibodies or genes
which modulate a particular biomolecular pathway. The results of
these experiments provide starting points for drug design and for
understanding the interaction or role of a particular biochemical
process in biology.
A key tool for HTS is the microarray, a kind of biological
computer chip made of glass designed to enable high-speed assaying
of compounds and their reactions. But storing and managing the
information being generated by HTS is, of course, a major
challenge.
Furthermore, EBI's Thornton says that the development of new
sequencing chains has increased the rate of gene sequencing by one,
if not two, orders of magnitude.
"Now, realistically, we can generate 10 to the power of 9 (the
size of one genome) in about two days," she says.
Just two years ago the expected time frame for accomplishing
this would have been several years.
EBI is one of a handful of organisations involved in the
Thousand Genomes Project, an international research consortium
hoping to create a detailed picture of human genetic variation. The
project involves sequencing the genomes of approximately 1,200
people from around the world.
Thornton says that a few years ago when the first tranche of
data arrived at Cambridge from the project, it alone was greater
than all of the genetic data then held by EBI. It was also the
first time that EBI had taken information on specific individuals,
an event that highlighted important issues of security and privacy
in biological research.
The EBI's data is housed within a 160 square-metre section of
the Wellcome Trust datacentre. Recently, Thornton and her team
calculated that they would soon need ten-times that space to
adequately house data from its many fast-growing projects.
Globally, further sources of new and yet-to-be understood data
are being discovered. Seen as the chemical equivalent of the HGP,
the Metabalone Project, led by Canada's University of Alberta, has
so far listed close to 3,000 chemicals found in or made by the
human body - triple what was expected, with double the number of
substances stemming from drugs and food. The chemicals, known as
metabolites, represent the ingredients of life, just as the human
genome represents the blueprint for life, with the former
presenting exciting new potential breakthroughs.
Microscopic images
Another emerging area is that of high-throughout, or large
scale, analysis and processing of microscopic images.
"In the next 10 years we will have pictures of cells and organs
that can be analysed by computers," predicts Thornton, adding,
"This has not even started yet."
IBM's Healthcare Information and Imaging Grid (HIIG), launched
last December, aims to address some of these challenges. The
company also announced new software features for the IBM Grid
Medical Archive Solution (GMAS), a high performance, grid-based
storage solution.
Its new software component, GAM 2.1, will now support
applications in digital pathology, high-throughput screening and
mass spectrometry (MS). MS is an analytical technique for
understanding the composition of a sample or molecule, and involves
ionizing chemical compounds, and measuring things such as
mass-to-charge ratios.
In-silico testing is another area expected to see huge growth in
the next few years as computers get better at simulating clinical
trials that would normally depend on data taken from animals and
humans. The cost savings to pharmaceutical companies and research
institutes would be enormous.
"Any rational scientist will say any method that can improve our
ability to accurately predict the effects of drugs or chemicals on
human beings has to be beneficial and would reduce the need for
animal experiments," notes Thornton. "But again it will take
time."
It has also been suggested that bioinfomatics may in future
allow scientists to reconstruct the genomes of extinct animals and
possibly bring them back to life.
The effective utilisation of all of this information will depend
largely on technologies capable of managing and interpreting it all
within a central repository, so different types of data can be
effectively cross-referenced.
Open source
Further, as more and more information is accumulated around the
world, it is crucial that scientists are better able to share data
and collaborate. A compound reaction discovered in Japan may, for
instance, have implications for a clinical trial in the UK.
Previously, such events would be the result of coincidence. Now
the EBI and other groups are attempting to reach agreement on the
development of open access platforms for biological data. Systems
employing or modelled on the concept of open source software are
expected to play a fundamental role. It is also hoped that
internet-based "browsers" for cancer, genomes and other areas of
research will aid in the sharing of information.
The success of these and other attempts to foster global
collaboration could have major implications for drugs and other
areas of discovery. But there is a long way to go.
"Of course, what we all want to do is convert that data into
knowledge and understanding, and translate that into improved
health, ecology and biodiversity," Thorton says.
"But there is much work in developing new algorithms and
approaches to interpreting and finding patterns in the data; we
still do not really understand the molecular basis for ageing."