Google uses machine learning – a branch of artificial intelligence – to improve the energy usage of its datacentres, Google datacentre chief Joe Kava told the Datacentres Europe 2014 conference.
He explained how and shared his power usage efficiency (PUE) tips with delegates on the second day of the conference.
“Over the last eight years, the datacentre industry has done well to adopt the PUE as a broad metric to improve datacentre energy efficiency,” Kava said. “But PUE in its current form has its limitations. This is because the modern datacentre is a complex interaction of multiple mechanical, electrical and control systems.
“In our pursuit of extreme efficiency, we’ve hit upon a new tool – machine learning.”
Before explaining how Google uses machine learning and sharing a technical white paper on its datacentre tool, Kava had some tips for datacentre operators:
Challenge your assumptions
Datacentre operators and managers should not simply accept the metrics and tools given to them but question how good they are and whether they really help them to cut down their equipment’s energy use.
Push the boundaries of operating parameters but do it smartly
“Push the boundaries but do it smartly by testing out your new ideas on a virtual machine in a pilot environment before putting it out into production.”
Use all the data you can
“Data is important if you want to achieve even better efficiency. There’s a story in every piece of data but all data may not be readily available. So be prepared to use tools and pull all data together to find hidden gems.”
Be obsessed with energy efficiency
That’s the only way you will help build a sustainable facility, Kava said, and energy savings equal cost savings. “We’re obsessed with saving energy and we’re always looking for ways to reduce our energy use even further.”
Always ask what your PUE should be rather than what it is
“Unless you know what it should be you don’t know if you are good at it or not.”
Consider the possibilities
Machine learning is just one possibility that Google has identified for more efficient power usage. “But there’s no discounting that there might be other ways to make datacentres energy efficient. Keep exploring the possibilities,” Kava said.
How Google uses AI to improve its PUE
“Jim Gao, an engineer on our datacentre team, is obsessed with machine learning,” Kava said. “Realising that we could be doing more with the data coming out of datacentres, Jim studied machine learning and started building models to predict – and improve – datacentre performance.”
The team’s machine learning model behaves like other machine learning models such as speech recognition – a computer analyses large amounts of data to recognise patterns and “learn” from them.
“It is best to allow machines to go through vast amounts of data because they don’t get bored of number crunching like humans and they are more accurate,” Kava said. It is also difficult for humans to see how all the variables – IT load, outside air temperature, etc – interact with each other.
The datacentre team takes the data it gathers during its day-to-day facility operations and runs it through the model to help make sense of complex interactions. The model assesses 19 different variables fed in for datacentre energy analysis – such as server load, number of chillers running, outside air temperature, outside air humidity, server load, etc.
The insights generated give the team information on where to find savings and even when to carry out maintenance tasks or a technology refresh.
“Jim’s models are now 99.6% accurate in predicting PUE. This means we can come up with new ways to squeeze more efficiency out of our operations,” Kava said.
A few months ago, Google took some servers from one datacentre offline for a few days. “Normally, this would make that datacentre less energy efficient. But we were able to use Jim’s models to change our cooling setup temporarily, reducing the impact of the change on our PUE for that time period.
“Small tweaks like this, on an ongoing basis, add up to significant savings in both energy and money.”
Kava shared a technical white paper detailing Gao’s machine learning model and urged datacentre operators, engineers and managers to look at the system.
“It doesn’t take a super-cluster or a supercomputer to do this. Our machine learning model for PUE runs on just one server. You can run it off one desktop too depending on the size of your organization,” he said.
But he also issued a warning. “Models are only as good as the data you put in. If we put garbage in, we get garbage out.
“We had to do quite a bit of data scrubbing. If you are going down this path, I would recommend that your engineers do a bit of studying around machine learning or your accuracy will not be that good – you will have to do your homework well.”