In search of the Big Bang

This month (August) the world's biggest particle accelerator, the Large Hadron Collider (LHC), will begin hurling subatomic particles called protons around a 27km circular tunnel running beneath the Swiss-French border, before crashing them into each other. By doing so, particle physicists hope to learn more about the physical universe. At the same time, they are reinventing the way they share their research with each other.

This month (August) the world's biggest particle accelerator, the Large Hadron Collider (LHC), will begin hurling subatomic particles called protons around a 27km circular tunnel running beneath the Swiss-French border, before crashing them into each other. By doing so, particle physicists hope to learn more about the physical universe. At the same time, they are reinventing the way they share their research with each other.

An international initiative that will involve more than 2,000 physicists from 150 research institutions in more than 30 countries, the LHC is being managed by the Geneva-based particle physics laboratory CERN, the European Organization for Nuclear Research. Founded in 1954, CERN has a distinguished scientific pedigree - it has been home to three Nobel laureates, and is where in 1990 computer scientist Tim Berners-Lee invented the World Wide Web.

The objective of particle physics - also known as high-energy physics or HEP - is to study the tiniest objects in nature to answer two fundamental questions: What is the world made of and what holds it together? To answer these questions, it is necessary to recreate the conditions that prevailed at the time of the Big Bang - which is the aim of particle accelerators.

Physicist Rolf-Dieter Heuer (pictured), who becomes CERN director general in January, explains, "We want to unravel the secrets of the microcosm and of the early universe. The LHC has the highest energy ever obtained in a collider, and so will bring us closer to the Big Bang and to the early universe."

A key objective will be to test the so-called Standard Model of particle physics - the best theory currently available to explain the fundamental interactions between the 12 elementary particles that make up all matter and the four fundamental forces that cause these particles to interact.

At the moment an important cornerstone of the Standard Model is missing: it does not explain how matter and force particles get their mass. "The Standard Model only works for massless particles, and we know that [with a few exceptions] the fundamental particles of the universe are not massless," says Heuer.

Higgs mechanism

To explain this missing component, physicists have postulated the so-called Higgs mechanism. "We know from theory, and we know from our precision tests, that the answer to the question of how particles gain mass must lie within the energy reach of the LHC," says Heuer. "If we don't find it, the Higgs mechanism doesn't exist, and theorists will have to find another theory to explain how particles acquire mass."

Physicists also hope to gain some insight into "dark matter" - matter that is invisible but whose presence can be inferred from its gravitational effects on visible matter. The Standard Model assumes that less than 5% of the energy and matter content of the universe is the visible universe, with more than 95% consisting of dark matter and dark energy. "Our hope is to get a first glimpse of dark matter," says Heuer.

It has taken six and a half years to build the LHC, at a cost of £4.75bn. But creating the world's most expensive collider is just the first challenge posed by the LHC. Another is how to collect and analyse the data.

The LHC will accelerate two beams of particles in opposite directions at the speed of light. When they reach a sufficiently high energy level, they will be crashed into each other in a collision that will rend the original protons, sending off a spray of other particles. Monitoring these collisions will be a series of enormous experimental instruments called detectors, including one as high as a five-storey building, called ATLAS, and another the size of 40 large aeroplanes, called CMS.

As particles pass through these detectors, they will be counted, traced and analysed using extremely sensitive equipment. The trackers of both ATLAS and CMS, for instance, contain silicon wafers. As charged particles pass through these wafers, they will give rise to electrical signals that will betray their passage. Outside the trackers are calorimeters, which slow down and absorb the particles, measuring their energy. The time it takes for a particle to pass through will also be minutely calculated.

The LHC will need to be both the coldest and the hottest place in the universe. To create the very strong magnetic fields needed to achieve superconductivity, for instance, the temperature will have to fall to -271.25°C. Then, when two beams of protons collide, it will soar to 100,000 times the temperature of the sun.

Volumes of data

The next challenge will be managing the huge volumes of data generated. "There will be 40 million collisions a second, but we can only afford to write a tiny fraction of them to tape," says Salvatore Mele, a project leader at CERN.

Because many of the events observed in the detectors will be unremarkable, the secret lies in homing in on the unusual, and recording only the 200 most interesting events every second. Even so, about 15 petabytes of data will be generated annually. If stored on CDs, this would create a 20km-high tower of discs.

Once collected, the data will be processed and used to perform complex theoretical simulations, a task requiring massive computing capacity. The problem, says Heuer, is that "no science centre, no research institution, and no particle physics lab in the world has enough computer power to do all the work".

CERN will distribute the data to a network of computing centres around the world using a dedicated computing grid. This will allow the workload to be shared, and ensure there are multiple copies of the data stored in case of failure.

But the biggest challenge will be how to store the data in a format that allows reuse. Historically, when a HEP experiment ended, the data was abandoned. But because it is costing £4.75bn to collect the LHC data, it would be profligate not allow reuse.

"Ten or 20 years ago we might have been able to repeat an experiment," says Heuer. "They were simpler, cheaper and on a smaller scale. Today that is not the case. So if we need to re-evaluate the data we collect to test a new theory, or adjust it to a new development, we are going to have to be able reuse it. That means we are going to need to save it as open data.

Formulising knowledge

The problem, says Mele, is that HEP data is generally written in "an experiment-propriety non-standard format" that only those working on the experiment understand. Also, this knowledge resides only in scientists' heads, and is forgotten once an experiment is finished. So the answer lies in formulising the knowledge and embedding it in the saved data. But for the moment, no one knows how to do this. "It remains a challenge for the techies," says Heuer.

Openness is not an issue for data alone, however. The research papers produced from the LHC experiments will also have to be open - which presents a different kind of challenge.

Today, when scientists publish their papers, they assign copyright to the publisher. Publishers arrange for the papers to be peer-reviewed, and then sell the final version back to the research community in the form of journal subscriptions.

But because of an explosion in research during recent decades, along with rampant journal price inflation, few research institutions can now afford all the journals they need. "Journal prices are rising very strongly," says Heuer. "So the reality today is that lots of researchers can no longer afford access to the papers they need."

This problem is not unique to particle physics - it affects the entire research community, and has given rise to the Open Access (OA) movement, which calls for all peer-reviewed scientific literature to be made freely available online.

Peer review

As the LCH countdown began, the HEP community launched a number of OA initiatives. In 2006, for instance, CERN spearheaded a new project called SCOAP3, which aims to pay publishers to organise peer review on an outsourced basis, thus allowing published research to be made freely available.

Funding bodies and research institutions are being asked to redirect the money they currently spend on journal subscriptions to a common fund managed by SCOAP3. Publishers will then be invited to tender for the peer review services they already provide, but without acquiring ownership of the research. The services will be paid centrally by the SCOAP3 consortium, and the papers placed on the open Web.

Essentially, it is a radical plan to "flip" the entire HEP journal literature from a subscription-based model - in which a paywall is erected between scientist and research - to an Open Access model. "The aim is to make all HEP journal articles free to read and reuse as we want, and at the same time alleviate the serials crisis," says Annette Holtkamp, an information professional at DESY, Germany's largest HEP research centre, and a member of the SCOAP3 working party

A second initiative will see the creation of a free online HEP database called INSPIRE. This will be pre-filled with nearly 2 million bibliographic records and full-text "preprints" harvested from existing HEP databases such as arXiv, SPIRES and the CERN Document Server (CDS).

If SCOAP3 proves successful, the final full-text version of every HEP paper published will be deposited in INSPIRE, making it a central resource containing the entire corpus of particle physics research.

Database model

This suggests scholarly publishing is set to migrate from a journal-based to a database model, and one likely consequence will be the development of "overlay journals". Instead of submitting their papers to publishers, researchers will deposit their preprints into online repositories such as INSPIRE. Publishers will then select papers, subject them to peer review (for which they will levy a service charge), and "publish" them as Web-based journals - although, in reality, the journals will be little more than a series of links to repository-based papers.

"INSPIRE would be an ideal test-bed to experiment with overlay journals, because it will contain the entire corpus of the discipline," says Holtkamp.

At the same time, more and more research will take place, and be "published", on blogs, wikis and in open lab notebooks. "The classical journal article will not remain the main vehicle for scholarly communication," says Holtkamp. "In the future we can expect to see different materials and media used at different stages of the research process."

What is key to current developments is the belief that scientific information must be openly available. Because science is a cumulative process, the greater the number of people who can access research, critique it, check it against the underlying data and then build on it, the sooner new solutions and theories will emerge. And as "Big Science" projects like the LHC become the norm, the need for openness will be even greater because the larger the project, the more complex the task, and the greater the need for collaboration - a concept neatly expressed in the context of Open Source software by Linus' Law: "Given enough eyeballs, all bugs are shallow."

Holtkamp adds, "I am pretty confident that Open Access will be the standard of the future for scientific papers, although it remains unclear when Open Data will become the norm."

Certainly, if the public is asked to fund further multi-billion-pound projects like the LHC, there will be growing pressure on scientists to maximise the value of the data they generate - and that will require greater openness.

Read more on IT innovation, research and development