CERN connects India to data network

India is now connected to Cern's Large Hadron Collider grid through a 1 Gbps network provided by Reliance Globalcom.

India is now connected to Cern's Large Hadron Collider grid through a 1 Gbps network provided by Reliance Globalcom. The Ethernet over MPLS network provides end-to-end connectivity, which allows CERN's Indian datacentre partner, TIFR (Tata Institute of Fundamental Research), near-instant access to the vast volumes of data generated by CERN's particle experiments programme in Geneva.

IT deputy department head at CERN, David Foster, said, "This is an important milestone showing how large-scale scientific projects like the Large Hadron Collider (LHC), at CERN unify communities worldwide."

He said that previously it was expensive to connect to India. "It has been difficult for India to get high-bandwidth to the rest of the world." But Mumbai is a tier 2 centre. It is one of a few hundred centres where analysis of the majority of the experiments is done, said Foster.

The IT challenge

Cern IT services the needs of the laboratory, which has 10,000 people on site, plus an additional 10,000 users worldwide. "We provide a wide range of services from localising infrastructure like web and email all through to [supporting] high capacity international networks and delivery of data."

Cern runs four experiments, continually gathering data. "Experiments run 24/7 to capture as much data as possible. It's statistical. We collide particles and create new particles. We have to keep colliding to generate enough events to statistically provide particles of a given characteristics exists."

The results of each experiment are captured using sensors that produce data equivalent to a 100 megapixels camera recording data at 40 million photographs per second. This equates to a petabyte per second of data.

"We cannot record data at this rate. Instead it gets filtered to a manageable size in the order of 1 Gbyte of data per second, written to tape and sent over the network to labs around the world," said Foster.

Foster said that data is retained for many years as there will always be opportunities for reprocessing. "We have a very large tape robot - about 48 petabytes worth of data."

A fundamental part of the grid is the file catalogue, which retains information on the data and where it is located. When a reseracher needs to run some analysis, the processing job is be sent to the site that holds the relevant data.

The grid

The LHC co-ordinates a computing grid to provide a homogenous facility worldwide for running data analysis jobs.

The grid is connected to the Tier 1 centres via 10 Gbps dedicated circuits. Some sites connect at 20 Gbps. The Tier 1 sites connected to Tier 2 use commodity IP networks like the commercial Internet, education networks or dedicated circuits.

The grid is essentially a software system that links all the resources together. This involves sophisticated middleware projects such as the European Middeware Initiative and European Grid Initiative (EGI).

The grid allows seamless movement of jobs. The EGI distributes data around the grid and ensures jobs are executed. A resource broker is used to match the job to where the data is. Once a job has been allotted a local site, it is put into a job queue, where it is scheduled to run.

Foster said that the development of new networks is allowing Cern to redesign how the grid works. As international network have become highly reliable at relatively low cost, he said there was a desire to increase the use of networking - to evolve the model continually. Thanks to cheaper networking, large amounts of data can be moved around quickly and cost effectively. "We are starting to look at the ability for jobs to dynamically pull data to wherever they run. The system can then dynamically cache the data."

Read more on Networking hardware