Jakub Jirsk - Fotolia

AWS G4 aims to lower cost of GPU-powered AI inference

Amazon Web Services seeks to cut the cost of running artificial intelligence inference engines with the introduction of a Nvida V100 GPU-based instance

The Amazon Web Services (AWS) public cloud appears to have undercut its main competitors with the introduction of a new GPU (graphics processing unit) EC2 instance called G4, which uses the Nvida V100 GPU.

AWS said the new G4 instance is designed to help accelerate machine learning inference and graphics-intensive workloads, both of which are computationally demanding tasks that benefit from additional GPU acceleration. Application areas include adding metadata to an image, object detection, recommender systems, automated speech recognition, and language translation.

AWS said the G4 instance can be used both for training artificial intelligence (AI) algorithms and to run the inference engines for AI-powered applications. It said GPUs enable its customers to reduce machine learning training from days to hours, but the real cost in AI is running the AI algorithm.

“Inference is what actually accounts for the vast majority of machine learning’s cost,” it said. “According to customers, machine learning inference can represent up to 90% of overall operational costs for running machine learning workloads.”

On-demand pricing of the g4dn.xlarge, four virtual core instance, with one GPU and 16GB of memory, starts at $0.526 per hour. The eight virtual core instance, with 32GBs of RAM, costs $0.752.

By comparison, Google’s one GPU V100 instance, with 16GB of memory, is currently priced at $0.74 under Google “pre-emptive GPU price” usage model. Meanwhile, Microsoft’s NC6s v3 instance on Azure with 6 V100 cores and 112GB of memory costs £2.2807 per hour on-demand.

Using G4 for machine learning on the AWS public cloud is supported via Amazon SageMaker or AWS Deep Learning AMIs (Amazon machine images). AWS said it supports machine learning frameworks such as TensorFlow, TensorRT, MXNet, PyTorch, Caffe2, CNTK and Chainer.

The company plans to offer support for Amazon Elastic Inference on G4 instances, which it claimed would slash the cost of inference by up to 75%.

According to AWS, G4 instances can also be used for graphics-intensive workloads. Application areas include remote graphics workstations, video transcoding, photo-realistic design, and game streaming in the cloud. It said G4 offered up to a 1.8x increase in graphics performance and up to 2x video transcoding capability over the previous-generation G3 instances.

Read more about machine learning acceleration

A startup has unveiled a new AI chip that has 10,000 times more bandwidth to accelerate artificial intelligence training.

Latest forecasts suggest spending on artificial intelligence is ramping up, and organisations that need raw machine learning performance are turning to custom hardware.

“These performance enhancements enable customers to use remote workstations in the cloud for running graphics-intensive applications like Autodesk Maya or 3D Studio Max, as well as efficiently create photo-realistic and high-resolution 3D content for movies and games,” said Amazon.

Matt Garman, vice-president, compute services at AWS, said: “With new G4 instances, we are making it more affordable to put machine learning in the hands of every developer. And with support for the latest video decode protocols, customers running graphics applications on G4 instances get superior graphics performance over G3 instances at the same cost.”

Games developer Electronics Arts is among the organisations making use of the G4 instance. Erik Zigman, EA’s vice-president of cloud, social, marketplace and cloud gaming engineering, said: “Working with AWS’s G4 instance has enabled us to build cost-effective and powerful services that are optimised for bringing online gaming to a wide range of devices.”

Read more on Artificial intelligence, automation and robotics

Data Center
Data Management