Case study: OpenStack hybrid cloud powers CERN’s experiments with universe

Particle physics lab CERN uses OpenStack private cloud and Rackspace cloud services to provide the high-throughput compute power researchers need

This article can also be found in the Premium Editorial Download: Computer Weekly: Hacking IT from the inside

IT scale, big data challenges and high-throughput compute needs are of a different order of magnitude altogether for CERN, the organisation that aims to find out what the universe is made of by conducting experiments in its Large Hadron Collider (LHC).

For one, the LHC detectors at CERN produce a staggering one petabyte (1PB, equivalent to a billion gigabytes) of data per second when running. And this is likely to increase during the next LHC run in 2015, and even more in future generations of colliders under discussion such as the Future Circular Collider (FCC), says Tim Bell, infrastructure manager at CERN. The LHC currently produces more than 25PB of data annually.

“By 2020, we estimate we will have 0.5 exabyte of data every year,” Bell says. That’s a whopping half a trillion gigabytes.

One of CERN's main challenges is analysing and sharing these massive amounts of data and performing accurate and efficient simulations of its instruments. Computing plays a critical role here, and the IT team strives to provide a manageable, cost-effective, secure, large-scale computing infrastructure to the scientists and researchers working on the world’s largest particle physics lab.

“Ours is not so much a high-performance computing infrastructure. It is, rather, a high-throughput computing infrastructure,” he says. “A lot of compute tasks have to be carried out, but they are all executed independently.” 

CERN currently has a data archive of 100PB, 11,000 servers, and 75,000 disk-tapes. It provides 10,000 scientists with the compute capacity for their particle physics experiments and has a total budget of a billion Swiss francs (£650m).

By 2020, we estimate we will have 0.5 exabyte of data every year

Tim Bell

But data management is only one challenge; providing IT resources on demand is another for the IT team. “How do we scale without raising costs was a question we’d ask ourselves,” says Bell.

The organisation chose open source private cloud computing platform OpenStack to run IT resources at scale two years ago. The aim in moving to a large-scale infrastructure-as-a-service (IaaS) OpenStack-based cloud was to help CERN expand its compute resources significantly and support scientists worldwide using the infrastructure to unlock the secrets of the universe.  

At that time, CERN’s OpenStack cloud was running on Essex, the fifth version of the cloud software API. Today, it runs on Havana, the eighth and most recent version of OpenStack, which is designed for building public, private and hybrid clouds. Havana incorporates nearly 400 new features to support software development, data management and application infrastructure at scale. 

“The upgrade, completed in February this year, required a lot of due diligence and planning, but we succeeded in completing it in six hours without much downtime,” says Bell.

When the accelerators or the LHC are not running, the IT resources remain idle. The IT team carries out upgrades when the accelerators are shut down, and has detailed the move from the older OpenStack to the newer version in a blogpost.

Bell, who is also an OpenStack board member, says: “We definitely see great value in open source technologies like OpenStack. They foster continuous technological improvements through community contributions, while also giving us the ability to quickly address challenges, such as massive scaling, by leveraging the work of others.” 

But the OpenStack private cloud CERN built was not enough for its enormous compute needs. “Our private cloud is heavily loaded and there’s a huge pressure on the cloud compute infrastructure,” Bell says.

How cloud federation is helping CERN

CERN’s cloud infrastructure has more than 65000 cores. After the next hardware upgrade, more than 35,000 new cores will be added. The IT team is also continuing to migrate existing in-house servers to OpenStack compute nodes at an average of 100 servers per week.

So the organisation has picked Rackspace’s public cloud infrastructure (also OpenStack-based) to run its workload-intensive hydro-physics applications. It has also started using Rackspace’s managed private cloud to run more mission-critical internal workloads since October 2013. 

It’s nice not to be on a public procurement cycle which took as long as 280 days. Today, on the cloud it happens over a cup of coffee

Tim Bell

“We chose Rackspace private cloud because it will bring true interoperability to our workloads and we can move applications back and forth between the two private clouds,” Bell explains.

Together CERN’s IT team and Rackspace are creating a reference architecture and operational model for federated cloud services between Rackspace’s private and public clouds and CERN’s own OpenStack-powered cloud.

A federated cloud (or cloud federation) is the deployment and management of multiple external and internal cloud computing services to match business needs. “Our computing requirements are very unique,” Bell says.

CERN has collaborated with the cloud provider in the past to burst its workloads into its public cloud when necessary, according to Bell. The expanded collaboration will see the two organisations federate CERN’s current managed services into Rackspace’s open public and private cloud environments and help create cloud federation technologies.

CERN is also deploying the Rackspace private cloud platform onto servers it uses for physics experiments to experience the benefits of hybrid cloud services.

One of the biggest benefits of having an interoperable cloud platform is the IT team’s ability to respond quickly to the compute needs of staff and scientists. “It’s nice not to be on a public procurement cycle which took as long as 280 days,” Bell says. 

The public procurement model involved steps such as the user expressing an IT requirement, the IT team developing a survey for suppliers, suppliers responding, the IT team testing the services, the suppliers delivering hardware and accepting payment, and the IT team setting up the system for the user. “Today, on the cloud it happens over a cup of coffee.”

But CERN was able to deploy federated cloud technologies and hybrid services quickly because the IT team had virtualised all the Linux servers. “We have a team of six guys in IT managing the journey to the cloud. So if we weren’t fully virtualised, we would be in trouble,” Bell says.

But the team had to overcome several cultural barriers in its journey to the cloud. “There was one group that said ‘don’t rush’ while another group had people who were like wizards – they had a very deep knowledge of legacy apps and would resist the move to the cloud,” Bell says.

In addition to OpenStack, CERN uses other major public cloud services such as those from AWS, and has a datacentre in Budapest, Hungary.  

What next for CERN IT then? 

“We are heading up to having 100,000 cores by 2015,” Bell says. “OpenStack features are coming thick and fast, so we will spend some time assessing what’s best for our needs and pick the key features to integrate into our cloud infrastructure. Then there are plans to develop database as a service, IT orchestration services and IT management.”

Read more on Datacentre capacity planning