pingingz - Fotolia

Cambridge replaces enterprise storage with direct attach Dell-Lustre setup

Cambridge University uses Dell's MD direct attached storage with Intel’s Lustre to support its data-intensive research programmes

Cambridge University used Dell and Intel storage technology to create cost-effective enterprise storage to support research programmes. In the past, the university's research was computing-intensive, but this has shifted to using heavier data loads.

Storage at the university used to be proprietary but Paul Calleja, HPCS director at the University of Cambridge, said that – with data growing at a petabyte a quarter – the storage requirements now exceeded the university's budget: "We could no longer afford to do the research programmes with proprietary hardware," he said.

Instead the university has built out a petabyte of storage with seven Dell MD storage arrays combined using Intel Lustre. Calleja said: "The setup gives a phenomenal price point and performance.

"We've now gone through a commoditisation process with storage." 

Calleja said that, combining Dell MD storage with Intel enterprise Lustre, it is possible to build enterprise-class storage using networked, directly attached storage. He said the configuration can outperform traditional enterprise storage.

Datacentre upgrade boosts HPC

The University of Cambridge has used Dell hardware for nine years to run high-performance computing (HPC), and recently upgraded its datacentre. Calleja said: "High-performance computing is essential to drive our research programme. We have built a new £20m datacentre, which enables us to grow our HPC."

The facility will enable Cambridge University to undertake research it could not perform before. Calleja said research it be done faster and deeper, which makes the university more competitive.

Through partnership with Dell Calleja said the university built out large-scale commodity-based systems. "We have pushed commodity systems to new levels of performance and size, which has enabled us to break the mould in the UK in terms of HPC, demonstrating that commodity systems can compete with proprietary hardware from the likes of Cray or Hitachi."

Read more about high-performance computing (HPC)

He said the Dell cluster achieved the same level of performance as the supercomputers, but at 200% lower cost. The university has begun offering supercomputing as a service to businesses. In 2012, Cambridge University worked with the Lotus Formula 1 team to model the aerodynamics of Lotus's racing cars. It also worked with a small graphics arts company, to render the special effects in The Planet of the Apes movie.

Keeping cool

Even though commodity HPC has enabled the university to keep costs in budget, Calleja said the challenge would be to keep the datacentre running below two megawatts' energy consumption. He said: "Power efficiency is one of the leading areas of innovation in the industry.

"We have used back-of-rack water-cooled doors and evaporative chiller units, which has given us a PUE of 1.15. So 15% of the energy going in is being used for cooling, which is pretty good and saves a lot of money when you are burning a megawatt of power.

"Efficient ways of scheduling jobs and virtualisation means you can unlock two to times greater power efficiency."

High-performance parallel programming

The university has been running an Nvidia GPU cluster for four years. "Our latest system is a Dell system in partnership with Nvidia, which was part-funded by the Square Kilometer Array (radio telescope)," Calleja said.

"If the time taken to synchronise data between the cards takes longer than the time the GPU card requires to do the calculation, you're kind of buggered: your scalability goes" 

 Paul Calleja 

The GPU cluster comprises a cluster of 256 graphics cards. Parallel applications share memory across all the cards to optimise performance. But Calleja said: "If the time taken to synchronise data between the cards takes longer than the time the GPU card requires to do the calculation, you're kind of buggered: your scalability goes. So we spent a lot of time with Dell developing a system integration architecture to get around this latency problem."

Cambridge University used a protocol called remote direct memory access (RDMA), which Calleja said provided low-latency memory transfers from one memory space to the next, improving the scalability of the parallel application.

Read more on IT for government and public sector