The Large Hadron Collider at Cern generates vasts amounts of data. Over 15 petabytes of information is processed...
every year across 100,000 processors distributed around the world. Cliff Saran speaks to one of the scientists on the team that develops data analysis for the project.
The LHC is running an experiment where two beams of protons are accelerated at high speed in opposite directions around a 27 km ring. When the beams collide sub-atomic particles are created, which are picked up by arrays of ultra-sensitive sensors. The project aims to identify the Higgs Boson, often called the "god particle", a hypothetical particle that the Standard Model physicists use to understand the universe.
The data analysis required to identify the presence of the Higgs Boson is an immensely complex task. At Cern a group of scientists are working on a data analysis framework called Root, developed as a C++ library, to support the analysis side of the experiment.
Axel Naumann, who works on the Root development team, says, "We collects lots of data. There are four experiments, Each experiment has its own analysis software but rather than reinventing the wheel, Cern provides some basic ingredients." Root is one of these ingredients, a project which provides data storage for C++ objects. Root is used by physicists to load the data into their computers in order to run mathematical, statistical tools and we generate the PDF docs for them
There are about 10,000 users around the world using Root. The framework is build on 2.5 million lines of code, and is supported by just six developers.
Code quality is paramount to ensure that the data analysis is correct. At the first level, a continuous integration test, Root is built continuously on Linux Unix and Windows. The code is compiled and compilation errors are checked. The second stage is unit testing, which checks more complete sets of features in the Root system. On the experiment side, physicists take the code and run test cases against the new version of Root to make sure they avoid discrepancies.
"Bugs get fixed pretty quickly," says Naumann. "One of the problems with testing is that we don't always know what to test or there are certain cases we cannot test."
Cern uses static code analysis to get around the issue that certain errors cannot be recreated in a test environment. "We have used dynamic testing tools, but found static tools give us a new dimension in testing, " he explains. Coverity is used for static testing of the Root framework. He describes most static analysis tools as "brainless" because such tools generally produce large numbers of false positives. They identify a code fault that could occur under certain conditions, which the developer then ignores because the error would not occur in the way the code has been written. "With Coverity, we had a false positive rate of 11%, which is amazing for static analysis."