Hybrid high-performance computing (HPC) that relies on the graphics processing unit (GPU) is not exactly a new concept. While the general-purpose graphics processing unit (GPGPU ) has been associated with HPC clusters for almost a decade now, the model began gaining momentum in organizations only over the last five years.
The Indian HPC community is no stranger to the GPGPU -powered HPC setup—also known as heterogeneous computing or GPU computing. Before x86- based commodity clusters entered the HPC space, supercomputers employed the concept of array processors to accelerate specific mathematical operations. With GPUs providing massively parallel architectures (using hundreds of cores on a single die) as well as attractive price-to-performance ratios, it was only natural that GPGPUs started entering HPC setups as “co-processors.” This brought significant performance boosts at a low cost that was previously unimaginable.
An example of such a GPU computing implementation is the SAGA -220 (Supercomputer for Aerospace with GPU Architecture) cluster at Vikram Sarabhai Space Research Center (VSSC ), a leading Indian space research center. The facility, with a theoretical peak performance of 220 teraflops, uses an in-house developed Linux cluster running on 200 quad-core dual Xeon diskless base nodes and 400 Nvidia Tesla C2070 (Fermi) GPUs. This cluster came about as an upgrade to VSSC ’s earlier 20 teraflop x86-based HPC setup which ran PA - RAS-3D, a serial-code based computational fluid dynamics application. VSSC’s performance boost objective of 10x meant a facility with nearly 6,000 CPUs, along with a power consumption of 5 megawatts and accompanying cooling requirements.
After the move to GPU computing, VSSC’s HPC setup makes do with power consumption levels of less than 150 kilowatt. The project cost of nearly Rs 14 crore (Rs 140 million or about $2.5 million) includes all civil work, power consumption, procurement and maintenance. These are the typical benefits—performance boosts as well as savings in energy, cost and space—that accompany GPU computing implementation in HPC environments.
HPC Options for All Budgets, Shapes and Sizes
With leading OEMs such as IBM, Dell, HP and SGI offering hybrid CPUGPU support in their product ranges for HPC requirements, the hardware ecosystem is already in place for GPU computing. For example, Dell’s latest 12g servers feature GPU support across the product range. “Based on our user feedback, our servers now support multiple video cards because workloads like BI [business intelligence] need graphic cards with capabilities beyond that of a standard processor. Instead of adding servers, add a GPU card to get performance boosts,” said Sitaram Venkat, director of the enterprise solutions business at Dell India.
GPU computing makes it possible for a 4-teraflop machine to be available at the workstation level using a single CPU and four GPU cards.
Heterogeneous computing product variants available for HPC are now available in various form factors. GPU computing makes it possible for a 4-teraflop machine to be available at the workstation level using single CPU and four GPU cards. Many Indian companies even opt for clusters of these machines that use Infiniband interconnects. Such heterogeneous workstations start in the range of Rs 500,000 (about $8,900) for a 2-teraflop machine with two GPU cards. GPU-enabled blades and rack servers are also very much in the offing.
Dissecting GPU computing
It’s useful to know the differences between GPU computing and traditional x86 CPU-based HPC clusters. As is clear from Figure 1, CPUs and GPUs tackle the processing of tasks differently. While CPUs excel at serial processing, GPUs are better at handling applications that require high floating point calculations and lower power consumption.
From the architectural standpoint, GPUs are inherently single instruction multiple data (SIMD) processors that rely on data parallelism. This calls for the use of parallel programming models to gain performance benefits in GPU computing setups. Older programs with serial code need to be rewritten in case of migration to GPU computing architectures, for maximum performance benefits. “In parallel processing, the program has to be written for efficient utilization of all the cores. That’s a bit of a change from the normal programming approach. It requires better understanding of the design,” said Vishal Dhupar, managing director, sales and marketing for South Asia at Nvidia.
Figure 1. Differences between GPU Computing and Traditional x86 CPU-based HPC Clusters
A fundamental difference between the CPU and GPU is the latter’s higher amount of cores per die, along with smaller cache memory and registers. This may create memory management bottlenecks. The program has to rely on system memory, as well as low memory requirements. “Programs that run on lower memory can benefit from the use of GPUs. Applications where the sections of the code can be modified to utilize lower memory or undertake more CPU intensive jobs can benefit from the accelerator,” said Sandeep Lodha, vice president for sales and marketing at NetWeb Technologies, a GPGPU HPC solutions vendor.
In VSSC ’s case, the developer team rewrote PARAS -3D since it was not inherently data parallel. Nvidia’s CUDA programming environment was used for conversion of PARAS -3D to data parallel SIMD. Of the more than 2,000 lines of original code, 200 lines had to be modified.
GPU-enabling Your App
Open Computing Language (OpenCL): Started by Apple and then spun into a proposal in collaboration with Nvidia, AMD, Intel and IBM, OpenCL has evolved into a very popular open cross-platform programming language. However, being a low-level language, there is a limited number of programmers with skill sets suitable for OpenCL.
Compute Unified Device Architecture (CUDA): The CUDA programming environment is a closed ecosystem. It allows application developers to code only on Nvidia’s GeForce, Ion, Quadro and Tesla GPUs.
Others include Intel’s Thread Building Blocks, as well as extensions to C++ like Microsoft’s C++ AMP (Accelerated Massive Parallelism). Intel’s C/ C++ extension Cilk Plus is also worth considering.
Which HPC Applications Benefit Most From GPU
There are several code characteristics that make certain HPC applications the best fit for GPU computing. “There are many classes of programs that require high floating point processing, are energy-intensive but not all that branchy.
These can be used by organizations as GPU-enabled applications. There are many applications that can benefit from that,” said Alan Lee, corporate vice president of research and advanced development at AMD.
Foremost among these include parallel processing-intensive tasks like seismic data processing, complex fluid dynamics, medical imaging and data analytics applications. While GPU computing is still not suitable for requirements such as database and SQL queries, it shows potential when it comes to analytics requirements.
Fly in the Ointment
Despite the hunky dory picture of GPU computing portrayed so far, several Indian HPC users are still not convinced about the real-world feasibility of GPG - PUs—mainly due to pilot projects going awry. These include leading HPC users such as the Bangalore-based Centre for Mathematical Modelling and Computer Simulation and Kolkata-based Saha Institute of Nuclear Physics.
Such experiences among users have not cast a favorable light on the success rates of GPU computing. This is where the increasing numbers of off-the-shelf GPU-enabled HPC applications—open and commercial—can make a difference.
ISVs have also contributed a fair bit to the FUD surrounding GPU computing. “Indian ISVs are not able to express how they can provide users with a different type of throughput using GPU computing. Many ISVs are busy selling advantages of their own applications rather than explaining how infrastructure like hybrid architectures can scale up infrastructure. The awareness level with Indian ISVs becomes a limitation,” said Dhupar.