naito8 -

Social media conglomerate Facebook on streamlining its hyperscale datacentre buildouts

Facebook offers a look at how its approach to designing and building its datacentres has evolved to keep pace with the growing demands of its growing user base

Social media conglomerate Facebook has lifted the lid on how an internal revamp of its building management system (BMS) processes back in 2017 set it on course to rapidly expand its global fleet of hyperscale datacentres in the years to come.

Speaking at the virtual Schneider Electric Innovation Summit conference, Jeff Ronan, global BMS technical strategy lead at Facebook, provided attendees with a behind-the-scenes look at the steps the firm’s leadership team have taken to deliver on its datacentre expansion plans.

Ronan joined Facebook in 2012, at a time when the company was starting to design its own greenfield datacentres. He was – at that time – the first BMS engineering hire the company had taken on. “I was employee number 3,000-ish, and we were in the middle of building our first two datacentres [and] our entire datacentre design, construction and operations team at that time was about two dozen-strong,” he said.

The BMS programme was still very much in its infancy. “We had no standards, and we had very loosely defined sequences and specifications. When I was hired, my marching orders were: ‘Just look around and try to find a way to make us better,” said Ronan.

The reasons for this are not difficult to fathom, as 2012 was an eventful year for Facebook. Not only did the platform hit one billion active users, but its leadership team also strengthened its hold on the social media market through its $1bn acquisition of photo-sharing site Instagram.

The company needed to make sure it had a resilient and scalable datacentre expansion plan in place to accommodate the growth in user numbers and interactions its growing roster of social media platforms would go on to potentially generate.

This challenge would become even more pressing in the years that followed, with the take-off of Facebook Messenger, along with the firm’s 2014 purchase of instant messaging service WhatsApp. “In early 2017, our leadership released an incredibly aggressive [datacentre] build schedule, and – at the same time – we were reaching the limits of our current direct digital control [DDC] platform,” said Ronan.

DDC systems

DDC systems are control systems that are typically deployed within commercial properties and offices to regulate their heating, ventilation and air conditioning (HVAC) systems, but they are not always considered a good fit for mission-critical environments, including datacentres.  

“Our datacentres are considered critical facilities, which teeter on the line between commercial and industrial [use cases] when considering a building management system,” said Ronan.

Some of the issues the company was running into with its DDC platform could be traced back to the fact it was proprietary, which limited the range of technology partners Facebook could work with, but it also lacked “inherent redundancy”.

“We needed to solve all these problems if [our datacentre buildouts were] going to become more repeatable and scalable,” he said.

Over the course of a month, the Facebook operations, design and construction teams carried out an assessment to establish how best to address these issues, before coming up with a suitable replacement for its proprietary DDC platform.

“[They] agreed to pivot to a PLC [programmable logic controller] platform and a delivery model that could provide a more inherently redundant solution with increased speed to market,” said Ronan.

“A programmable logic controller that can be procured through any wholesaler, and installed and commissioned by any qualified system integrator, and this really opened up our options for us.”

“We needed a partner with global reach, that could support every facet of our BMS business needs across the globe”
Jeff Ronan, Facebook

PLCs are classified as being industrial control systems, and typically feature a ruggedised computer unit that can be used to automate and regulate datacentre processes in a more tailored and customisable way.

Having decided that swapping out its DDC for a PLC would be the right way to go, the company then needed to find the right supplier to source the technology from, before deciding to go with Schneider Electric.

“We needed a partner with global reach, that could support every facet of our BMS business needs across the globe,” he said. “We [also] needed a partner large and flexible enough to go fast and change course on a dime. We needed a partner that would work closely with us to understand our challenges and develop solutions together.

“They would need to be experts in the platform hardware, but also on HVAC and controls. We [also] needed a partner that would allow us to influence their product roadmaps and build tools that supported our ever-growing complex designs and processes.”

Following a “long and arduous” interview process, Schneider emerged as the supplier that could meet its criteria, said Ronan. “And we felt we could trust [the company] to help us meet our global growth.”

Ronan’s team had recommended to Facebook’s senior leadership team that they proceed relatively slowly with the Schneider contract by enlisting the firm to work with it on a pilot deployment. “[That was so] we could glean lessons to help us build out a repeatable, high-quality execution plan,” he said, but Facebook’s management team had other ideas about how the engagement should proceed.

“Our leadership team made a ‘rip off the band aid’ decision to immediately pivot all design and upcoming projects to PLC,” said Ronan. “It meant a whole lot of work for everyone, but – in hindsight – it was the correct call.”

Design completion

At the time of this decision, the company had four datacentre regions close to design completion, and two regional expansions that had just been finished, he recalled.

“The decision was to pivot for each of these and all projects moving forward,” he said. “That meant a complete specification rewrite, the development of all new PLC-based standards, building a PLC delivery model while also gaining buy-in and acceptance from all of our downstream partners. We also had to prepare our internal teams to manage completely new or different workflows.”

One of the first things Schneider Electric did after securing the Facebook contract was create a cross-functional team, populated with experts from the worlds of building management and industrial systems, recalled Bill Westbrock, the company’s global strategy account executive.

“We started out building a cross functional team that draws from Schneider Electric experts from not only the building management side which focuses on the HVAC categories, but also on the industrial automation side to make sure we were capturing the PLC industrial automation part of it, blending those things together,” he said.

On the Facebook side, a new delivery model for its datacentre projects was also embarked on to ensure each contractor would be accountable and take ownership for the particular deliverable they were responsible for.

“It was a model that none of our partners had ever experienced, and it came with some significant growing pains, but [as a project owner] we knew the only way to get what we really wanted was to make this ourselves,” said Ronan.

“As such, we created a delivery structure with direct owner oversight of the development and delivery project of all the hardware designing programming elements associated with BMS. At the project level, the general contractors hired independent system integrators to install and commission each individual site.”

Read more about hyperscale datacentre builds

To give attendees an insight into how the shift in strategy had accelerated the pace of Facebook’s datacentre buildout plans, Ronan explained how the firm had one datacentre at 50 megawatts of capacity by 2011.

By 2018, however, the company had 18 server farms online with 450 megawatts of capacity. “By the end of 2022, we expect to have about 68 datacentres, serving traffic across the world,” he said.

And in terms of how much traffic that amounts to, he said – across Facebook, Messenger, WhatsApp and Instagram – the firm’s datacentres hosted more than 100 billion messages every day by the end of 2020. To accommodate all these interactions and messages, the datacentres Facebook builds are decidedly supersize. “Not only are these datacentres just extremely large but they [exist] across a 200-acre campus, [and that] campus has multiple data halls on it, as well as high-voltage utility substation,” said Schneider Electric’s Westbrock.

“There’s a renewable energy system and dozens of standby generation for the backup requirements, and then – of course – once you get inside, just endless rows and rows of IT rack and infrastructure,” he said. “So much that you need scooters and bikes to get around inside the data hall and to go from building to building on the campus. [They] truly are a remarkable design and accomplishment by Facebook.”

Given the sheer size of the Facebook’s datacentre footprint, its senior leadership team are continually looking for ways to streamline the buildout process even further, with optimising its processes being a key focus for the firm throughout 2020.

“You always have to take a look at what’s working and what’s not and make course corrections, and 2020 became our year of optimisation,” said Ronan. “We’d made it through our original build schedule, but we needed greater organisational alignment, and efficient execution. 

“At the end of 2019 we put an aggressive 2020 roadmap in place, working with [our] leadership team. We reorganised internal team structures for broader alignment [because] we needed to lead our commercial execution processes and create much more repeatable workflows.”

The Librarian

This work spawned the development of an automated piece of kit dubbed The Librarian, which Facebook now uses to catalogue all of its disparate mechanical system types into model numbers, said Ronan. “And the Schneider Electric team built a tool that will generate automated deliverables based on [known] bottlenecks. What used to take a month or a week can now take days or even hours to output.”

It’s a technology that other datacentre operators could also stand to reap the benefits of, continued Westbrock.

“The Librarian allows the owners and operators of any datacentre company to continuously play out the ‘what if’ scenarios,” he said. “And to do that with a minimal set of engineering people and costs. This allows those datacentre owners to run some different scenarios, and then continue improving the process without inhibiting their current design workflow.

“In the end, this allows them to keep up with changes in technology to meet the growing demand of their business,” said Westbrock. “It also frees up some of the staff from the Schneider team because we don’t need as many people running those designs and those individuals can help out in other functions with Facebook’s business. There are some great efficiencies [to be had] using The Librarian across all datacentre-type accounts.”

As an example of the efficiencies Facebook has achieved through the use of The Librarian tool, Ronan pointed to how the technology had allowed the organisation to streamline the number of panel design variants from 64 to about a dozen, which he described as a “huge improvement” for the firm’s global datacentre buildout plans.

“Our optimisation efforts are paying dividends,” he said. “We were effectively able to do more requests as our datacentre design and build scheduled grew and accelerated, and we reduced the Schneider Electric programme support headcount by 38%, while increasing quality and decreasing schedule slippage.”

Reduction in headcount

There was reduction in headcount on the Schneider Electric side from 135 team members to 88, said Ronan. “In the meantime, Facebook’s own internal design BMS team grew from four to what is now 14. This reduced our base project BMS costs by over 22% on average and created some flexibility to shift trained and skilled resources [around the business].”

Reflecting on what his own personal learnings were from participating in the project, he said the engagement with Schneider Electric served to highlight the importance of having “reliable partners and strong relationships” that are built on “reciprocal trust” to get good results.

“Most of all, you have to have a one-team approach. We’re either all successful together, or we all fail together,” said Ronan.

Read more on Datacentre systems management

Data Center
Data Management