Prepare to deploy custom hardware to speed up AI

Latest forecasts suggest spending on artificial intelligence is ramping up, and organisations that need raw machine learning performance are turning to custom hardware

Cliff Saran, Managing Editor

Published: 04 Apr 2019 10:00

Spending in artificial intelligence (AI) across Europe is set to grow by 49% in 2019 compared with 2018, as organisations begin using the technology to gain a competitive advantage, according to IDC.

Andrea Minonne, senior research analyst at IDC Customer Insight & Analysis in Europe, said: “Many European retailers, such as Sephora, Asos and Zara, as well as banks such as NatWest and HSBC, are already experiencing the benefits of AI – including increased store visits, higher revenues, reduced costs, and more pleasant and personalised customer journeys.

“Industry-specific use cases related to automation of processes are becoming mainstream and the focus is set to shift toward next-generation use of AI for personalisation or predictive purposes.”

There is industry consensus that a traditional CPU-based computer architecture is generally not up to the task of running machine learning algorithms. Today, graphics processors offer the performance needed to run current machine learning applications.

But the web giants that require even greater levels of performance are now developing custom AI acceleration hardware. For instance, in February the FT reported that Facebook was developing its own chip for machine learning.

Facebook joins Google, which announced its custom AI chip three years ago. In 2016, Google unveiled a tensor processing unit (TPU), a custom application specific integrated circuit (Asic) it had built specifically for machine learning – and tailored for the TensorFlow deep neural network (DNN) learning module.

At the time, Norm Jouppi, distinguished hardware engineer at Google, wrote: “We have been running TPUs inside our datacentres for more than a year, and have found them to deliver an order of magnitude better-optimised performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future [three generations of Moore’s Law].”

Google’s TPU is available on GCP. The top-end v2-512 Cloud TPU v2 Pod is currently being tested and costs $422.40 per pod slice per hour.

Asics are exceedingly expensive and limited because they are designed to run one application, such as the TensorFlow DNN module in the case of Google’s TPU. Microsoft Azure offers acceleration using a field programmable gate array (FPGA), and according to Microsoft, FPGAs provide performance close to Asics.

“They are also flexible and reconfigurable over time, to implement new logic,” it said. Its hardware accelerated machine learning architecture, dubbed Brainwave, is based on Intel’s FPGA devices to achieve what Microsoft said “enables data scientists and developers to accelerate real-time AI calculation”.

Acceleration with GPUs

Arguably, graphics processing units (GPUs) are the entry point for most organisations looking to deploy hardware to accelerate machine learning algorithms. According to Nvidia, GPUs fit well with the need to train deep neural networks for AI applications.

“Because neural networks are created from large numbers of identical neurons, they are highly parallel by nature,” it said. “This parallelism maps naturally to GPUs, which provide a significant speed-up over CPU-only training.”

Jos Martin, senior engineering manager and principal architect for parallel computing tools at MathWorks, said: “Without the advent of GPUs and the fast computation that they bring, we would not be seeing the current explosion in this area. AI developments and GPU computing go hand in hand to accelerate each other’s growth.”

Among the advances in GPU technology over the last few years, they now support what is known in computer science as “mixed precision algorithms”, said Martin.

GPUs for machine learning are easily accessible from the cloud, such as Amazon EC2 P3, which offers up to eight Nvidia V100 tensor core GPUs and up to 100 Gbps of networking throughput for $31.22 per hour.

Prepare to deploy custom hardware to speed up AI

Latest forecasts suggest spending on artificial intelligence is ramping up, and organisations that need raw machine learning performance are turning to custom hardware

Acceleration with GPUs

Read more about AI acceleration

Read more on Artificial intelligence, automation and robotics

10 top AI hardware and chip-making companies in 2025

GPUs vs. TPUs vs. NPUs: Comparing AI hardware options

tensor processing unit (TPU)

Google and Hugging Face unveil AI partnership