In this guest post, Spencer Lamb, vice president of Harlow-based colocation provider Kao Data, discusses how HPC datacentres are aiding the fight against Covid-19
All science is data-driven: finding and classifying how the world works, then modelling the underlying rules that govern its behaviour. The need for good science is most urgent in medicine, especially in current times where knowledge reduces suffering and saves lives, yet medical data is the most complex, messy and massive data there is.
This has limited how computing can help life sciences – until recently. With the evolution of high-performance computing (HPC) over the past thirty years, the huge amounts of information inherent in living systems can at last be treated with the same industrial efficiency as enjoyed by the engineering, financial and media fields.
In 2018, Intersect360 Research reported that total worldwide market revenue for HPC was $36.1 billion, forecasting it to grow to $38.7 billion in 2019 and $41.4 billion in 2020. Earlier this year the company stated that 2020 market revenue was on course to fall significantly short of the previous forecast as a direct result of the pandemic. However, further data suggests there has only been a 3.7% drop against a worst case scenario of 12%, which indicates that HPC usage remains strong, amidst the Covid-19 pandemic.
The needs of HPC and high-density datacentre infrastructure to underpin life science and medical research, especially those in search of the vaccine for Covid-19, are likely only to increase. Moreover, as more organisations turn over their supercomputers to help research the virus, the dynamic change not yet highlighted by the projections is that the percentage of where HPC is being used is shifting, particularly within the London-to-Cambridge UK Innovation Corridor
Bringing science to life
The use of huge data sets, automated analytics and AI is spread across many life science fields. Learning from terabytes of images, AI diagnostics can match clinicians in detecting pathologies in medical scan and X-ray data, while cryo-electron microscopy images can reveal 3D molecular structure within cells and viruses much faster than traditional X-ray diffraction techniques.
Considering that a single modern medical instrument like a lattice light field microscope can generate two to three terabytes of image data in a couple of hours, these techniques need the kind of storage and massively-parallel GPU based technologies that can only be sensibly provisioned in specialised datacentres, built to meet these standards of technical excellence.
Moreover, to transmit terrabytes of data across the internet effectively, direct 100GB direct wavelength connections are mandatory. This, however, can present a host of challenges for operators that are unprepared, so for many research organisations the close proximity and greater levels of compute offered through on-premise, high performance edge datacentres at scale, can help to increase operational reliability.
Other instrumental HPC tasks in genomics and life sciences include the development of individually-tuned medicines, which acknowledge the fact that large variations exist in human biology and that different treatments work with different efficiencies in different patients. For this realisation to turn into clinical practice, though, wholesale genetic analysis of individuals must be matched with very large databases of drug actions and interactions.
Elsewhere, the near-infinite possible configurations of structures and interactions between proteins makes the computational creation and testing of candidates the only productive approach to exploring whole classes of potential drugs, or of understanding processes both normal and abnormal of metabolism, infection, immunology and drug efficacy.
Put crudely, the more processing units, the larger the available memory, the faster the networking and the more capacious the disk storage, then the more effective the research. HPC is the only technology capable of managing such research at the scale needed.
HPC investments hold firm through Covid-19
Due to their dependency on high-density processing and graphic processing unit (GPU), with increasing volumes of compute cores and storage to increase the speed of output, many UK-based pharmaceutical and medical institutions continue to invest heavily in HPC to support scientific research.
A 2017 report by the UK’s Medical Research Council (MRC), “Mapping the Landscape of UK Health Data Research & Innovation” identifies some thirty HPC centres involved in life sciences.
At the time of writing, the Innovation Corridor was hosting around thirteen of the major research projects named within it. And, today, as more companies share resources and become located in closer proximity to one another, that number will have undoubtedly increased; further positioning the area as a national hub for HPC and UK biomedical research.
A typical project, one funded by the MRC itself, is eMedLab, devoted to data-driven discovery for personalised medicine. Bringing together the Francis Crick Institute, four London universities, the Wellcome Sanger Institute, and the European Bioinformatics Institute (EMBL-EBI), it spans the full length of the Innovation Corridor from London to Cambridge, to create one of the world’s largest secure biomedical cloud infrastructures.
This collaboration is responsible for integrating and openly sharing heterogenous data from personal healthcare records, imaging, pharmacoinformatics and genomics. Around one hundred researchers use it to develop analytical tools to work with cancers, cardiovascular, autoimmune and other rare diseases.
Today there are thousands of rare diseases, many with only minute numbers of cases. Due to such a large volume, no single clinician can ever know more than a handful, or be able to recommend the best path to diagnosis for all.
Enterprise-scale computing has therefore become essential for tracking and monitoring these types of illnesses, offering the ability to proliferate and perform continuous analysis on both patient and research data, whilst providing the end-user, who is often unaware of the complex processing and hardware systems involved, with real-time information on prevention, treatments and cure.
This, like all the other projects and installations detailed in the report, is world-class, and the UK, especially within the Innovation Corridor, operates at the highest level of collaborative life science research. This gives the landscape a population of clinical researchers, data scientists, system architects and hardware, software and networking expertise, which encourages further innovation.
Located on its doorstep is the specialist datacentre infrastructure to meet and underpin that HPC demand. Supercomputers need access to utlra-resilient power, efficient cooling (both environmentally and economically), low latency connectivity, networking, industrial grade hosting infrastructure and crucially a lot of scalability to expand as compute workloads grow. Many applications also need to be physically close by the researchers and research teams using them, to minimise latency and maximise data throughput.
The Kao Data campus has been built on principles of technical excellence to deliver HPC to this major research corridor. Located at its heart, the campus is already serving a number of major life science, AI and entrepeneurial startup customers, providing them with highly-scalable and cost-effective infrastructure that meet the demands of HPC workloads.
The datacentre is designed to run entirely on 100% renewable energy with ultra-efficient free-air cooling, that offer customers a lower total cost of ownership (TCO), keeping operating expenses (OpEx) costs down while guaranteeing the highest levels of sustainability.
As medical research embarks on its biggest and most important test yet, with virtually the entire industry moving to attack Covid-19, the need to provide the most effective HPC datacentre infrastructure as efficiently as possible has never been higher. All the signs are that the challenge is being met.