Enterprises, scientists and academics turn to public cloud for HPC

Organisations are starting to use supercomputers on demand via public cloud to access HPC capabilities

Corporates, scientists, engineers and researchers all need access to high-performance compute power, low latency and high bandwidth for their data-intensive workloads. They are increasingly using supercomputers on demand via public cloud infrastructure to access the HPC capabilities they need at affordable costs.

High-performance computing (HPC) uses parallel processing to run advanced applications efficiently, reliably and quickly. HPC systems typically execute in excess of a teraflop or 1012 floating-point operations per second. 

Steve Conway, IDC research vice president for HPC, says: “HPC systems can handle more complex queries, more variables and faster turnaround requirements. The move to HPC has already saved PayPal more than $700m and is saving tens of millions of dollars per year for other commercial companies, on top of the benefits reported by established HPC users in government, academia and industry."

According to IDC, the broader HPC market of storage, servers, software, middleware and services is projected to grow from $20bn in 2013 to $29bn by 2018.

"We are now forecasting a 7.4% CAGR for the HPC server market from 2013 to 2018, and we expect the HPC server market to exceed $14.7bn by 2018," says Earl Joseph, IDC’s HPC vice president.

The move to HPC has already saved PayPal more than $700m and is saving tens of millions of dollars per year for others

Steve Conway, IDC

Organisations in a variety of market segments are turning to HPC technologies to tackle big data analytics workloads effectively, according to Conway. HPC systems are typically used by scientists, academics, pharma companies, engineers and government agencies such as the military, and some are taking these workloads to the cloud.

One US-based company wanted to build a 156,000-core supercomputer for molecular modelling to develop more efficient solar panels. It used the AWS cloud to launch its supercomputer system simultaneously across the US, South America, Asia Pacific and Ireland.

Running at 1.21 petaflops of aggregate compute power to simulate 205,000 materials, the system crunched 264 compute years in just 18 hours. “This made this system one of the top 50 supercomputers in the world,” says AWS technology evangelist Ian Massingham.

With its cheap costs, public cloud has democratised access to supercomputing or HPC, which used to be financially unviable for many organisations and limited to those who could spend tens of millions on hardware.

HPC software company Cycle Computing, for example. would have had to spend $68m to run a supercomputer of the size and scale it needed in a traditional IT model, says Massingham. The AWS bill for the system was $33,000. 

Cycle Computing itself offers cloud HPC software to enterprises such as Novartis, and Johnson & Johnson.

Others such as Pfizer, Unilever, Bristol-Myers Squibb (BMS), Bankinter and Nokia are all using the AWS cloud to speed up research and reduce IT costs by creating cluster compute or cluster GPU servers to meet their workload requirements on demand.

Another AWS user is Oxford University in its research for the Malaria Atlas Project, which is creating detailed global malaria maps to help in the fight against the disease. 

We now have access to the kind of serious parallel processing that we need to implement model runs in feasible timescales and the storage to deal with the massive model output

Pete Gething, Oxford University

Dr Pete Gething, of the university’s department of zoology, says: “Current knowledge is surprisingly patchy and this hampers efforts to target funds and resources to the people that need them most. All models use top-end spatial statistics, and these don’t come cheap when you’re mapping things down to 5km by 5km pixels across the whole world. 

“Up to now, computation and storage have been major restrictions, placing constraints on the models we are able to run. With AWS, we now have access to the kind of serious parallel processing that we need to implement model runs in feasible timescales and the storage to deal with the massive model output.”

And at pharmaceutical company Pfizer, HPC services are being used in research ranging from the deep biological understanding of diseases to the design of safe and effective therapeutic agents. Pfizer’s in-house HPC software and systems support large-scale data analysis, research projects, clinical analytics and modelling. But when it wanted to expand its HPC capabilities further, it turned to AWS in 2010.

By using thousands of offsite servers as well as its own machines, Pfizer compressed compute time from weeks to hours, making quicker financial and strategic decisions, and saving millions of dollars, according to Michael Miller, the company’s head of HPC for R&D. Public cloud HPC helped Pfizer to cut R&D costs by $600m.

The primary cost savings have come from avoiding infrastructure spend. “Pfizer did not have to invest in additional hardware and software, which is only used during peak loads, and that saving allowed for investments in other research activities,” explains Miller. “AWS is not a replacement but an addition to our HPC capabilities, providing a unique augmentation to our computing capabilities.”

And that’s not all. Cancer Research UK uses the AWS cloud for the back-end of its ‘Genes in Space’ game, which allows people to help classify cancer research data while playing a game. And gene analysis now takes Unilever hours, instead of weeks, with productivity quintupling as a result of cloud HPC.

Public cloud HPC is not restricted to academia and scientists. Financial institutions in Europe also exploit the cheap infrastructure for hyperscale computing. Spain’s sixth largest bank Bankinter uses HPC on AWS to run credit risk simulations to evaluate the financial health of its clients. With supercomputers on the cloud, the bank has brought down the average time for running simulations from 23 hours to 20 minutes. It estimates it would cost 100 times more in hardware alone if it chose to exit the cloud.

Another European financial sector player that performs sales-based HPC in the public cloud is insurance firm Mapfre, which uses it to calculate its solvency. Every month insurance companies have to perform a company solvency check to test their risk under the worst case scenario. All customer policies have to be entered into mathematical calculations to check whether the company can meet the potential payout. Running these calculations requires expensive HPC machines that are used only a few times a month, according to AWS. 

The cloud service allows the firm to spin up a supercomputer on demand and shut it back down when it is finished. This is helping Mapfre save substantially -- the on-premise hardware investment for three years is put at more than €1m compared with €180,000 on the cloud for the same period.

AWS has released cfncluster, a sample code framework that deploys and maintains HPC clusters on the Amazon cloud. “We have made it available, for free, for the community to use and it is available on GitHub,” says Massingham.

The assumption that it is cheaper to do HPC on the cloud is concerning. HPC usually maintains a very high utilisation rate

Addison Snell, Intersect360

Yet cloud still forms a very small part of the total supercomputing segment, with only about 36% of enterprises using the cloud for HPC, according to HPC research firm Intersect360. “Even with these enterprises, it tends to be just 5% of their workloads,” says Addison Snell, its chief executive. “But it seems to be growing.”

He adds: “The assumption that it is cheaper to do HPC on the cloud is concerning. HPC usually maintains a very high utilisation rate and that was one of the biggest USPs of cloud services.”

So while Cycle Computing's $33,000 AWS bill, for instance, was cheaper than the tens of millions it would have had to spend on on-premises infrastructure, it was only so cheap because it spot-purchased unused capacity. AWS Spot Instance is a purchasing option that allows a customer to purchase unused Amazon EC2 computer capacity at a heavily discounted rate.  

AWS Spot Instance gives Amazon a flexible way to sell extra capacity. The instances are acquired through a bidding process in which a customer specifies a price per hour they are willing to pay.

Conway agrees: “Enterprises may be using cloud for HPC, but it is really for high-compute and large but simple workloads. It will be used more as cloud and HPC both evolve.”

Read more on Clustering for high availability and HPC