Circular IT series - OctoML: Lighter carbon footprints on planet ML

This is a guest post for Computer Weekly’s ‘circular technology’ series written by Jason Knight in his capacity as co-founder and chief product officer (CPO) at OctoML – a company known for its platform and solutions that automatically maximise Machine Learning (ML) model performance on any hardware, cloud or edge device.

Knight wrights as follows…

AI investment and innovation has exploded in the last decade.

An ecosystem of open source ML tools, frameworks, libraries and compilers have brought ML to the verge of becoming a mainstream discipline for enterprise IT. But the AI/ML renaissance is colliding with a looming global economic downturn and ongoing chip shortage, which threaten to slow the pace of innovation.

Economic uncertainty is putting ML projects at risk.

They often carry large price tags and success rates that hover around 50%. These conditions come at a terrible time for companies that invested early in machine learning projects that are on the cusp of production deployment.

ML’s CO2 cost

It costs hundreds of thousands, if not millions of pounds, dollars or other to train and run a deep learning model in production. The environmental costs are significant too. In 2019, an analysis from the University of Massachusetts Amherst found that the process of training large AI models can emit “more than 626,000 pounds of carbon dioxide equivalent” today.

To put it in perspective, this is nearly five times the lifetime emissions of the average car.

Achieving AI sustainability will require a combination of datacentre efficiencies, adoption of lightweight ML architectures and significant performance gains from the hardware itself. Adopting specialised chips whose architecture and circuits are designed to handle AI/ML workloads can improve performance and energy efficiency by 2x–5x.

Snagglechips

But there’s a snag.

More than two years since the onset of the pandemic, the ongoing chip shortage continues to negatively impact the availability and cost of specialised hardware.

Much more than most compute workloads, ML models often need specific hardware targets to reach reasonable cost and performance. Specialised AI/ML chips work best on specific numeric data types, model types and sparsity patterns. If the required chip is scarce or unavailable, it can sometimes be difficult to transport the workload to another, more readily available hardware option without extensive (and expensive) re-engineering.

The new model/hardware pairing may result in such poor performance that it is no longer viable in production. Incidentally, this inflexible path to production is part of the reason why 47% of ML models fail.

The bottom line is that we cannot expect an influx of AI/ML chips to be immediately usable to the broader market anytime soon. Because of this, both practitioners and AI/ML technology providers must find ways to maximise the performance and efficiency available from today’s common hardware.

3-routes to ML efficiency

There are three ways they can do this:

Squeeze better performance from the existing available hardware targets – this can be achieved by using research techniques such as sparsity and quantisation, using pre-accelerated models, identifying the fastest runtimes for your chosen model, using graph optimisation and advanced compilation techniques through services like Apache TVM.
Transparent, vendor-neutral benchmarking – this lets customers ‘shop around’ to evaluate which hardware targets and cloud providers enable them to meet cost SLAs and still hit performance metrics.
Invest in technologies that make it easier to move ML workloads from one hardware target to another – when compute costs go up or when a cheaper, equally performant option becomes available, it shouldn’t take an army of engineers to rebuild the entire ML pipeline. Instead, ops teams should be able to switch from CPU to GPU targets (or vice versa) to optimise utilisation and cost or even right-sizing CPU instances, or switching from one CPU instance to another.