EMedLab, a partnership of seven research and academic institutions, has built a private cloud 5.5PB high-performance computing (HPC) cluster with Red Hat Enterprise Linux OpenStack using Cinder block storage and IBM’s Spectrum Scale (formerly GPFS) parallel file system.
The organisation rejected use of object storage – an emerging choice for very large capacity research data use cases – and also rejected use of the public cloud because of concerns over control and security of data.
EMedLab built the HPC cluster – in conjunction with Sheffield-based HPC specialist OCF – to provide compute resources to researchers working on genetic susceptibility to and appropriate treatments for cancers, cardio-vascular and rare diseases. Researchers can request and configure compute resources of up to 6,000 cores and storage for their projects via a web interface.
The HPC cluster is hosted at the JISC datacentre in Slough. It comprises 252 Lenovo Flex System blades with 24 cores each and 500MB of RAM. These each run a KVM hypervisor upon which virtual machines are created for research projects, while private cloud management functions come from Red Hat Enterprise Linux OpenStack.
Storage – to a total capacity of 5.5PB – is a combination of OpenStack Cinder block storage and IBM Spectrum Scale. Physical storage capacity comes via Lenovo GSS24 and GSS26 SAS JBODs with 1.2PB on a faster scratch tier and 4.3PB of bulk capacity on larger SAS drives.
Connectivity is Ethernet with 10Gbps Mellanox NICs.
That’s because the eMedLab cluster needs the Posix file system compatibility of GPFS married to the private cloud capabilities – such as multi-tenancy – of OpenStack and its Cinder block access storage method, said Bruno Silva, operations lead for eMedLab.
Read more about HPC storage
- Computer Weekly surveys the key suppliers in HPC storage, where huge amounts of IOPS, clustering, parallel file systems and custom silicon provide storage for massive number crunching operations.
- 100,000 Genomes Project rejects building open-source parallel file system on x86 servers and opts for EMC Isilon clustered NAS deployed by Capita S3.
“It’s a non-standard approach, to provide flexibility,” said Silva. “OpenStack provides storage to virtual machines through Cinder and Cinder utilises files within Spectrum Scale, which gives us a Posix file system to share data between virtual machines.
“In OpenStack we opted for Cinder block storage and not Swift because we don’t yet have workloads designed for object storage. GPFS does, however, have object storage on its roadmap. We’ve also had discussions about Manila [OpenStack’s file access storage system] but we’re not convinced it is mature enough to use yet.
“We could have gone for object storage if we were certain we were going for a pure OpenStack approach. But the system we’ve gone for also has GPFS/Spectrum Scale to give good performance for Posix and Cinder.”
The choice between public and private cloud
During the procurement process, the private cloud/OpenStack approach was weighed against buying in a more off-the-shelf HPC system, as well as use of public cloud.
“All the options were close in terms of quality, but there were various constraints in our requirements and with the OCF system we found a better fit,” said Silva.
“We preferred a private cloud system because it gave us control over our infrastructure and we know it is secure. We’re keeping an eye on public cloud. We know it is coming, but it has to overcome hurdles in terms of cost and regulatory concerns. But for the work we’re doing private cloud is most appropriate at the moment.”