Facebook optimises software infrastructure for datacentre power efficiency

Facebook has developed a system called Autoscale to optimise its software infrastructure and make its datacentre more energy efficient

After making its datacentre hardware and server components more energy efficient, Facebook has now turned to optimising its software infrastructure for further power efficiency and reduced environmental impact.

The social network has developed a system called Autoscale for power-efficient load balancing. Currently used in its production clusters, Autoscale has already provided “significant energy savings”, the social media giant said.

“We’ve talked a lot about our progress on energy-efficient hardware and datacentre design through the Open Compute Project, but we’ve also started looking at how we could improve the energy efficiency of our software,” said Quiang Wu, Facebook’s infrastructure software engineer on its engineering blog.

The datacentre engineers developed the Autoscale system for software efficiency after exploring multiple avenues, including power modeling and profiling, peak power management and energy-proportional computing, to improve power efficiency.

On the hardware front, Facebook revealed that it has reduced costs by 24% and increased IT energy efficiency by 38% since it started using open-source hardware systems in its datacentres.

The open-source systems are based on the Open Compute Project, which was initiated in April 2011 by a small group of Facebook engineers looking to scale the company’s computing infrastructure in the most efficient and economical way possible.

But progress in hardware infrastructure efficiency alone will not help Facebook meet its goal of reducing the carbon footprint as its datacentre and IT infrastructure grows.

Facebook’s total energy use in 2013 was 822 million kilowatt hours (kWh), with more than 95% of this energy used in its four datacentres.

The social network’s energy use and carbon footprint rose in 2013, despite its increasing focus on datacentre energy efficiency and use of clean and renewable energy sources. The combined carbon footprint of all its datacentres (three in the US and one in Europe) for 2013 was 355,000 metric tonnes of CO2e, compared with 285,000 metric tonnes of CO2e in 2011.

Its engineers are now exploring ways to optimise its software components too to reduce the environmental impact as its IT footprint expands.

“We are still in the early stages of optimising our software infrastructure for power efficiency, and we’re continuing to explore opportunities in different layers of our software stack to reduce datacentre power and energy usage,” Wu said. “We hope that through continued innovation, we will make Facebook’s infrastructure more efficient and environmentally sustainable.”

How does Autoscale work and yield power savings for Facebook?

Every day, Facebook web clusters handle billions of page requests that increase server utilisation, especially during peak hours. The company’s default load-balancing policy is set on a round-robin algorithm – every server receives roughly the same number of page requests and utilises roughly the same amount of CPU. However, the engineers found that this system is not very efficient.

This was because some of its web servers consume a high amount of power (130 watts) when they run smaller requests-per-second. The datacentre team decided to avoid running these servers for smaller requests.

“To tackle this problem and utilise power more efficiently, we changed the way that load is distributed to the different web servers in a cluster,” Wu explained in his blog post.

“The basic idea of Autoscale is that instead of a purely round-robin approach, the load balancer will concentrate workload to a server until it has at least a medium-level workload.”

Following the implementation of Autoscale, the load balancer now uses an active, or “virtual”, pool of servers, which is essentially a subset of the physical server pool. Autoscale adjusts the active pool size so that each active server will get at least medium-level CPU utilisation regardless of the overall workload level. The servers that aren’t in the active pool don’t receive traffic.

“Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system,” Wu admitted.

In one example of a production web cluster, Autoscale led to 27% power savings around midnight when traffic was low. The average power saving over a 24-hour cycle is about 10-15% for different web clusters.

“In a system with a large number of web clusters, Autoscale can save a significant amount of energy,” Wu said.

“In the current stage, we’ve decided to either leave 'inactive' servers running idle to save energy or to repurpose the inactive capacity for batch-processing tasks. Both ways improved our overall datacentre energy efficiency,” he added.

Read more on Datacentre energy efficiency and green IT