In this special edition of the Computer Weekly Downtime Upload podcast, Gartner analyst Tiny Haynes discusses datacentre energy management.
The European Commission (EC) has set out plans to curb rising energy prices in Europe. As part of this plan, member states will be required to identify the 10% of hours with the highest expected price and reduce demand during those peak hours. The EC also proposes that member states aim to reduce overall electricity demand by at least 10% until 31 March 2023.
There is likely to be an impact on business and this is set to influence energy use in the power-hungry IT sector. For instance, in August, driven by the German government’s decision to curb energy use, Deutsche Bank said reducing the amount of energy used for cooling in technical rooms would be one of a number of energy-saving plans it will put in place.
Arm offers a helping hand
The timing of the EC’s announcement ties in with Arm unveiling a radical strategy to establish its processors in the datacentre. Arm has set out its Neoverse roadmap to deliver what it claims is an alternative to “legacy” x86 server infrastructure. The idea is that public cloud providers and server manufacturers will be able to offer greater performance at a lower price and with a lower “price per watt” value.
For instance, Amazon Web Services’ (AWS) Arm instances use Nvidia’s Graviton3 chips, which the cloud provider says consume up to 60% less energy for the same performance than comparable EC2 instances. As part of its push to encourage the migration of workloads to Arm-based systems, the chip maker says it is working with the major cloud providers to provide tools to help develop software that can run on its processors.
“With all major cloud providers offering Arm-based instances, we are collaborating with our cloud partners to optimise cloud-native software infrastructure, frameworks and workloads. These include contributions to widely adopted projects such as Kubernetes, Istio and multiple CI/CD tools for providing native builds for the Arm architecture,” says Chris Bergey, senior vice-president and general manager responsible for Arm’s infrastructure line of business.
“We’ve enhanced machine learning frameworks such as TensorFlow and cloud software workloads such as open source databases, big data and analytics and media processing. This work is happening in collaboration with developers who are contributing to open source communities so all end users can build their next-generation applications on Arm,” he adds.
It is highly unlikely that organisations will invest heavily in migrating x86 workloads to Arm servers, but Arm may offer a compelling alternative for new applications, which will enable its footprint in datacentre computing to grow organically.
To tackle the ongoing energy crisis in the meantime, IT administrators will need to act far quicker. As Gartner senior director analyst Tiny Haynes points out in this special edition of the Computer Weekly Downtime Upload podcast, the tech sector has been working for years to reduce energy consumption. This is mainly due to the limits set by electricity grids.
Haynes says servers are now far better at utilising processing resources. “Before virtualisation, the utilisation of CPUs was in the 20% range. With virtualisation, you’re getting 60% to 70% utilisation. But you still have some opportunity for greater utilisation,” he says.
Haynes believes systems management tools can help system administrators eke out more efficiency from existing server infrastructure. “There’s a cost associated with systems management tools, but there might be a payback, and some systems management tools providers show you that the return on investment is there as well.”
He says one of the areas IT chiefs should focus on is ownership of IT hardware. “People have a mentality that if they get a business case approved for a server on a particular project, then they will hold on to that server, which doesn’t really allow this IT resource to be shared.”
Systems management tools provide both a control plane for server infrastructure and a way to identify usage patterns. “Try and see what servers you have,” Haynes adds. “And look at how you can actually share these resources further across more parts of the organisation.”
Such sharing saves on the costs incurred in procuring additional server resources. An extra server will require extra connectivity, cooling requirements and electricity, which then has a direct bearing on server room energy bills.
Given the inevitable price hikes that are coming due to rises in inflation and tough economic conditions, Haynes says organisations should look at how they can try to extend the life of existing servers. “This means not having to replace things automatically after three years unless there’s actually a performance requirement which can’t be met by existing IT infrastructure. These things are easy wins.”
Another area to look at is the uptime of servers. While server hardware is designed to run 24/7, some workloads may only be needed during peak working hours. They can be throttled back to consume less electricity. “Why are servers on at night? Maybe they are doing backups? If you’ve got a traditional tape-based backup system, see if it is possible to back up to disk.”
In doing so, the backup can be completed quicker, reducing the server power consumption. The disk backup completes more quickly, which means servers can be put into standby mode or set up such that their power usage is just a trickle of what they normally need.
Modern equipment can be designed to run more efficiently. Workloads that require power-hungry arrays of graphics processing units (GPUs), even those like artificial intelligence, may in fact run more efficiently in terms of energy consumption on such hardware than traditional servers.
This, as Ondrej Burkacky, a partner in McKinsey’s global semiconductors and business technology practices, explains, is because the highly parallel architecture of a GPU array can run workloads far faster than if the same workload were run on a CPU, meaning that the overall energy consumption of the GPU-based workload would be less.
A system-wide approach to energy efficiency
The GPU example can be applied across any system built on semiconductors. The approach is an alternative to focusing on the power consumption of each individual chip.
“A chip itself uses power. You can make it more power-efficient so that it uses less power, but one of the areas that is trending now is system design,” says Burkacky. While the hardware may require more chips and so have a larger energy footprint, “you can design a system that basically is helping to improve power efficiency as a total of the system, or you design something which is much more dynamic and, depending on the workload, can adjust its energy consumption”.
For instance, he says hardware could be developed to ensure a pump runs optimally. The same techniques can be applied in smart building technology to help organisations reduce overall energy consumption in their offices.
Overall, the industry has continued its push to greater levels of energy efficiency. Choice of hardware architecture such as using GPUs or Arm going forward, combined with a system-wide approach to energy efficiency, may help in the long term. But, as Gartner’s Haynes points out, there is still plenty of headroom to increase utilisation – and as a result, energy efficiency – of existing x86 servers, towards 90% and above.
What happens when there’s a power cut?
If the worst happens and the energy supply fails, as is the case when there is a power cut, Tiny Haynes, a senior analyst at Gartner, says a uninterruptible power supply (UPS) battery should be able to keep the servers running for about 30 minutes on full load.
“A diesel generator takes about a minute to spin up online,” he says, but the challenge for administrators is how long the servers can be kept running off the generator. “There’s the fuel supply – and the problem is, if you’ve got a huge fuel supply, you have to make sure you’re protecting against the diesel bug.” The diesel bug is contamination of the fuel due to microorganisms.
Haynes says a server room generator may only have sufficient diesel fuel to run for a couple of hours during a blackout. CIOs must then consider their plans for an extended blackout.
Many organisations take advantage of public cloud IT infrastructure, which can be utilised if their on-premise facility is offline. But as Haynes points out, lifting and shifting on-premise workloads to the public cloud leads to huge running costs. “It’s a bit like taking a taxi home and keeping it in your garage overnight with the meter running,” he says.
He urges IT decision-makers who want to use the public cloud as a way to keep server software operations during extended blackouts to buy reserved instances from the public cloud providers, which only incur costs once the virtual servers are running.