The Wellcome Trust Centre has deployed a high-performance computing cluster based on Fujitsu blades, Mellanox InfiniBand and DataDirect Networks storage systems to support statistical genetics research.
Designed in conjunction with OCF, a provider of high-performance computing (HPC), data management, big data storage and analytics, the cluster enables researchers to run statistical analysis on the human genome.
The hardware powers applications that analyse small genetic differences across a population of 1,000 people.
Fujitsu BX900 blade with Intel Ivy Bridge CPUs are used in the cluster, giving performance 2.6 times better than its predecessor, built in 2011.
It boasts 1,728 cores of processing power, up from the 912 of its forerunner, with 16GB of 1866MHz memory per core compared with a maximum of 8GB per core on the older cluster of the Wellcome Trust Centre for Human Genetics (WTCHG).
Robert Esnouf, head of the research computing core at WTCHG, said: “If you are interested in a certain disease, you can partition the genome and analyse the genetic difference between those individuals who have a medical condition like diabetes and those that do not.”
More articles on data analysis
- Wellcome Trust maps big data genetics with DDN HPC storage
- CIO interview: Mark Bramwell, head of IT, Wellcome Trust
- Genomics England exploits big data analytics to personalise cancer treatment
Processing power limits the number of people whose DNA makeup can be analysed statistically. But the more DNA that is analysed, the greater the accuracy of the statistical analysis.
Typically, a single human genome requires 30GB. Esnouf said that processing the DNA data of a thousand individuals requires “a lot of I/O remapping”.
He added: “An individual may have thousands of different genetic variations. The more people you can get, the more chance you have of finding low-frequency genetic differences.”
The new cluster works alongside a second production cluster; both clusters share a Mellanox FDR InfiniBand network that links the compute nodes to a DDN GridScaler SFA12K storage system whose controllers can read block data at 20Gbps. According to WTCHG this speed is essential for keeping the cluster at maximum utilisation and consistently fed with genomic data.
Esnouf said that GPU technology, which is increasingly being used for HPC work, was not suitable for his applications. Statistical genetic analysis requires the transfers of large amounts of data, making the applications less efficient when run on fixed-memory GPU devices.
Esnouf is also keeping an eye on the cloud. “We have a watching brief on Open Stack.”
However, he said that once a user moves off standard AWS, the price goes up, making it less economical. Given the size of the dataset, he said it would be very slow and expensive to move genetic data processing into the cloud.