kjekol - stock.adobe.com

A waste of energy: Dealing with idle servers in the datacentre

Servers can be the slackers of the datacentre, consuming power without doing much work. Why not whip them into shape?

The Uptime Institute estimated as far back as 2015 that idle servers could be wasting around 30% of their consumed energy, with improvements fuelled by trends such as virtualisation largely plateaued.

According to Uptime, the proportion of power consumed by “functionally dead” servers in the datacentre looks to be creeping up again, which is not what operators want to hear as they struggle to contain costs and target sustainability.

Todd Traver, vice-president for digital resiliency at the Uptime Institute, confirms that the issue is worthy of attention. “The analysis of idle power consumption will drive focus on the IT planning and processes around application design, procurement and the business processes that enabled the server to be installed in the datacentre in the first place,” Traver tells ComputerWeekly.

Yet higher performance multi-core servers, requiring higher idle power in the range of 20W or more than lower-power servers, can deliver performance improvements of over 200% versus lower-powered servers, he notes. If a datacentre was myopically focused on reducing power consumed by servers, that would drive the wrong buying behaviour.

“This could actually increase overall power consumption since it would significantly sub-optimise the amount of workload processed per watt consumed,” warns Traver.

So, what should be done?

Datacentre operators can play a role in helping to reduce idle power by, for instance, ensuring the hardware provides performance based on the service-level objectives (SLO) required by the application they must support. “Some IT shops tend to over-purchase server performance, ‘Just in case’,” adds Traver.

He notes that resistance from IT teams worried about application performance can be encountered, but careful planning should ensure many applications easily withstand properly implemented hardware power management, without affecting end user or SLO targets.

Start by sizing server components and capabilities for the workload and understanding the application and its requirements alongside throughput, response time, memory use, cache, and so on. Then ensure hardware C-state power management functions are turned on and used, says Traver.

Stage three is continuous monitoring and increasing of server utilisation, with software available to help balance workload across servers, he adds.

Sascha Giese, head geek at infrastructure management provider SolarWinds, agrees: “With orchestration software which is in use in in bigger datacentres, we would actually be able to dynamically shut down machines that are no use right now. That can help quite a lot.” 

Improving the machines themselves and changing mindsets remains important – shifting away from an over-emphasis on high performance. Shutting things down might also extend hardware lifetimes.

Giese says that even with technological improvements happening at server level and increased densities, broader considerations remain that go beyond agility. It’s all one part of a larger puzzle, which might not offer a perfect solution, he says.

New thinking might address how energy consumption and utilisation are measured and interpreted, which can be different within different organisations and even budgeted for differently.

“Obviously, it is in the interest of administrators to provide a lot of resources. That’s a big problem because they might not consider the ongoing costs, which is basically what you’re after in the big picture,” says Giese.

Designing power-saving schemes

Simon Riggs, PostgreSQL fellow at managed database provider EDB, has worked frequently on power consumption codes as a developer. When implementing power reduction techniques in software, including PostgreSQL, the team starts by analysing the software with Linux PowerTop to see which parts of the system wake up when idle. Then they look at the code to learn which wait loops are active.

A typical design pattern for normal operation might be waking when requests for work arrive or every two to five seconds to recheck status. After 50 idle loops, the pattern might be to move from normal to hibernate mode but move straight back to normal mode when woken for work.

The team reduces power consumption by extending wait loop timeouts to 60 seconds, which Riggs says gives a good balance between responsiveness and power consumption.

“This scheme is fairly easy to implement, and we encourage all software authors to follow these techniques to reduce server power consumption,” Riggs adds. “Although it seems obvious, adding a ‘low power mode’ isn’t high on the priority list for many businesses.”

Progress can and should be reviewed regularly, he points out – adding that he has spotted a few more areas that the EDB team can clean up when it comes to power consumption coding while maintaining responsiveness of the application.

“Probably everybody thinks that it’s somebody else’s job to tackle these things. Yet, perhaps 50-75% of servers out there are not used much,” he says. “In a business such as a bank with 5,000-10,000 databases, quite a lot of those don’t do that much. A lot of those databases are 1GB or less and might only have a few transactions per day.”

Jonathan Bridges is chief innovation officer at cloud provider Exponential-e, which has a presence in 34 UK datacentres. He says that cutting back on powering inactive servers is crucial to datacentres looking to become more sustainable and make savings, with so many workloads – including cloud environments – idle for large chunks of time, and scale-out has often not been architected effectively.

“We’re finding a lot of ghost VMs [virtual machines],” Bridges says. “We see people trying to put in software technology so cloud management platforms typically federate those multiple environments.”

Persistent monitoring may reveal underutilised workloads and other gaps which can be targeted with automation and business process logic to enable switch off or at least a more strategic business choice around the IT spend.

However, what typically happens especially with the prevalence of shadow IT is that IT departments don’t actually know what’s happening. Also, these problems can become more prevalent as organisations grow, spread and disperse globally and manage multiple off-the-shelf systems that weren’t originally designed to work together, Bridges notes.

“Typically, you monitor for things being available, you more monitor for performance on things. You’re not really looking into those to work out that they’re not being consumed,” he says. “Unless they’re set up to look across all the departments and also not to do just traditional monitoring and checking.”

Refactoring applications to become cloud native for public cloud or on-premise containerisation might present an opportunity in this respect to build applications more effectively for efficient scale-ups – or scale-downs – that help reduce power consumption per server.

While power efficiency and density improvements have been achieved, the industry should now be seeking to do better still – and quickly, Bridges suggests.

Organisations setting out to assess what is happening might find that they’re already quite efficient, but more often than not they might find some overprovisioning that can be tackled without waiting for new tech advancements.

“We’re at a point in time where the challenges we’ve had across the world, which has affected the supply chain and a whole host of things, are seeing the cost of energy skyrocket,” Bridges says. “Cost inflation on power alone can be adding 6-10% on your cost.”

Ori Pekelman, chief product officer at platform-as-a-service (PaaS) provider Platform.sh, agrees that server idle issues can be tackled. However, he insists that it must come back to reconsideration of overall mindset on the best ways to consume computer resources.

“When you see how software is running today in the cloud, the level of inefficiency you see is absolutely ridiculous,” he says.

Inefficiency not in isolation

Not only are servers running idle but there are all of the other considerations around sustainability, such as Scope 3 calculations. For example, upgrades might turn out to have a net negative effect, even if the server power consumption levels on a daily basis are lower after installing new kit.

The move to cloud itself can obscure some of these considerations, simply because bills for energy and water use and so on are abstracted away and not in the end user’s face.

And datacentre providers themselves can also have incentives to obscure some of those costs in the drive for business and customer growth.

“It’s not simply about idle servers,” Pekelman says.  “And datacentre emissions have not ballooned over the past 20 years. The only way to think about this is to take a while to build the models – robust models that take into account a number of years and don’t concentrate only on energy usage per server.”

Fixing these issues will require more engineering and “actual science”, he warns. Providers are still using techniques that are 20 years old while still not being able to share and scale better utilised loads when usage patterns are already “very full”. This might mean for example, reducing duplicated images if possible and instead only having a single copy on every server.

Workloads could also be localised or dynamically shifted around the world – for example, to Sweden for instead of France to be supplied with nuclear – depending on your perspective of the benefits of those energy sources. Some of this might require trade-offs in other areas, such as availability and the latencies required, to achieve the flexibility needed.

This might not be what datacentre providers want for themselves, but should ultimately help them deliver what customers are increasingly likely to be looking for.

“Generally, if you’re not a datacentre provider, your interests are more aligned with those of the planet,” Pekelman suggests. “Trade off goals versus efficiency, perhaps not now but later. The good news is that it means doing software better.”

Read more about datacentre management

Read more on Managing servers and operating systems