In March 1989,
Tim Berners-Lee
submitted a proposal for an information management system to his
boss, Mike Sendall. 'Vague, but exciting', were the words that
Sendall wrote on the proposal, allowing Berners-Lee to develop what
eventually became known as the
World Wide
Web.
"I found it frustrating that in those days, there was different
information on different computers, but you had to log on to
different computers to get at it. Also, sometimes you had to learn
a different program on each computer," said Berners-Lee on his
website.
The proposal was originally intended to help scientists working
on the big bang project to keep track of the masses of information
they compiled in reports. The reason we have the web today is only
because of the research needs of physicists at the European
Organisation for Nuclear Research,
Cern.
But this isn't the only case where the research needs of Cern's
scientists have lead to innovations in web technologies.
In 1987 Cern worked with a US start-up with only 20 employees to
develop and deploy one of the first routers in Europe - the
ASM/2-32EM - to act as a firewall between Cern's public Ethernet
and its supercomputer. That company was
Cisco.
Today, the company has more than 6,3000 employees.
And the innovations haven't stopped. In 2005, the physics
laboratory built the
first working intercontinental 10 Gigabit Ethernet wide area
network to process the large amounts of data from the
Large Hadron Collider (LHC)
particle accelerator project. Applications like this are now
rising to prominence in areas such as finance and in banking
applications, according to analysts Gartner.
So if the technologies at Cern predicate future commercial
trends in internet technology, what is the department working on at
the moment and what could be next for the public face of the
internet? One area is in using database technology to handle the
masses of information generated by its computing grid.
Cern will be using one of the biggest computer grids this summer
to pool the processing power of about 100,000 CPUs worldwide. It
will process information at a rate of 1gbps, said
Francois Grey, head of Cern's IT communications team.
"The experiment will produce roughly 15 petabytes (15 million
Gbytes) of data a year - enough to fill 100,000 DVDs," he said.
The constant requirement for as much data processing power as
possible led Cern to become one of the first users of clustering
technology, starting in 1996. It pioneered the use of clusters of
low-cost Linux hardware servers working together as one large,
powerful machine. Cern helped develop software to ensure that the
reliability and virtualisation capabilities of databases could be
extended seamlessly across a cluster of commodity servers, greatly
reducing the cost of high-performance computing.
Cern has also pushed database-clustering technology further to
enable a single database to run across a number of distributed
computers. The LCG database deployment project has set up a
worldwide distributed database infrastructure for LHC.
It will do this using a program called
Oracle Streams to capture, filter and synchronise data stores
worldwide.
The software allows users to control what information is put
into a stream - the connection between the primary data capture and
its end source/sources - and will determine how the stream of data
flows is routed to nodes worldwide, and to determine what happens
to events in the stream and how the stream terminates. By
specifying the configuration of the elements acting on the stream,
a user can filter and manage data in a more meaningful way.
"The amount of data people are using on the web is only going to
grow as pipes get fatter and connection speeds are ramped up. As
the architectures for high-speed networks are installed, they will
only be as good if the underlying databases are able to deal with
gigabytes and maybe even petabytes of data," said Grey.
For companies with global operations, keeping mass stores of
data synchronised will be the next challenge, especially as data
processing requirements will increase.
"For us, monitoring the database and streams performance has
been key towards maintaining grid control and in optimising any
larger scale set-up," said Grey.
While the challenges at Cern remain unresolved at present,
history would indicate that synchronising databases across grid
set-ups and dealing with petabytes of data on an annual basis will
be a challenge for commercial organisations further down the
line.
And if the work at Cern has shown one thing over time, it has
been the willingness to share the solutions to their problems with
the wider world.