Bit by Bit: Novel techniques to shrink AI's carbon footprint

We look at some of the new techniques that promise to reduce energy usage of AI by 65%

Oliver King-Smith

Published: 21 Mar 2024

Concern is growing over the rapid expansion of artificial intelligence (AI) and the demand the servers are placing on the electrical and water supply. A new Nvidia DGX computer, the gold standard for AI work, consumes over 10KW of power. Big Tech will buy millions of these systems this year, using more power than all of New York City.

But it is not just the electricity needed to run these computers. They get hot, really hot, and so they need cooling. You have to get rid of that heat. That typically takes up two times more power than the actual computer. So now that 10KW machine is really using 30KW when running. These new servers will consume three times more than all of the electricity used in California in 2022! To get around this, server farms are starting to use water, lots of water to help cool their equipment. This saves electricity, but is using our precious fresh water to help cut costs.

AI is hungry for power, and things are going to get worse. How can we solve this problem? Fortunately researchers are already starting to pursue more efficient methods of making and using AI. Four promising techniques are: model reuse, ReLora, MoE (Mixture of Experts), and quantisation.

Model reuse involves retraining an already trained model for a new purpose, saving time and energy compared to training from scratch. This approach not only conserves resources but also often results in better-performing models. Both Meta (Facebook's parent) and Mistral have been good about releasing models that can be reused.

ReLora and Lora reduce the number of calculations needed when retraining models for new uses, further saving energy and enabling the use of smaller, less power-hungry computers. This means that instead of relying on large, energy-intensive systems like NVidia’s DGX, a modest graphics card can often suffice for retraining.

MoE models, such as those recently released by Mistral, have fewer parameters than conventional models, resulting in fewer calculations and reduced energy consumption. Moreover, MoE models only activate the necessary blocks when in use, much like turning off lights in unused rooms, leading to a 65% reduction in energy usage.

Quantisation is an innovative technique that reduces the size of AI models. By quantising a model, the number of bits required to represent each parameter is reduced. This shrinks the model size, enabling the use of less powerful and more energy-efficient hardware. While quantisation can slightly reduce model accuracy, for many practical applications this tradeoff is not noticeable.

By combining these four techniques, we have successfully reused a 47bn parameter MoE model and retrained it for a client using a server that consumes less than 1KW of power, completing the process in just 10 hours. Additionally, the client can run the model on standard Apple Mac computers with energy-efficient M2 silicon chips.

As AI becomes more prevalent, we need to start thinking about the energy and water usage. Research into more efficient training and utilisation methods is yielding promising results. By integrating these new techniques into our tool flows, we not only benefit our clients but also contribute to a more sustainable future for our planet.

Oliver King-Smith is CEO of smartR AI, a company which develops applications based on their SCOTi AI and alertR frameworks.

Bit by Bit: Novel techniques to shrink AI's carbon footprint

We look at some of the new techniques that promise to reduce energy usage of AI by 65%

Read more about green AI

Read more on IT efficiency and sustainability

Baidu makes foundation model Ernie 4.5 open source

Execs shy away from open models and open source AI

Innovation culture is in our DNA, says Alibaba chairman

Complexities of integrating AI into legacy data centers