Salesforce.com, one of the largest cloud-based software providers in the world - and with annual revenue of more than $1.3bn (£860m) - has a unique take on datacentre infrastructure for cloud services.
While economy of scale is being touted as a key source of savings for cloud providers, the San Francisco-based provider of business software and the development platform Force.com has a minimal number of servers in just a few co-location cages. It provides services without server virtualisation, an approach that is contrary to the often-repeated mantra of virtualisation being a prerequisite for cloud or infrastructure sharing.
Salesforce.com infrastructure is shared and very few would deny that the way in which it delivers services is cloud-based.
"Everyone runs on a large, shared service," says Ariel Kelman, vice-president of platform product marketing at Salesforce.com. "You can think of, in effect, all our customers having one infrastructure stack and one version of software: then what we do is, at the logical layer, separate our customers."
This is an alternative to a single-layer approach, where someone needs to provision and manage a customer's virtual or physical compute capacity. In such architectures, every set-up has its own set of software that has to be periodically updated, one customer at a time.
"When we upgrade our service, we upgrade all our customers at the same time," Kelman says. Upgrades on Salesforce.com do not require any involvement from the customer. All custom apps a customer may have built, or any customised alterations they may have added to their sales or collaboration applications, keep running while infrastructure underneath them is upgraded. Minor patches are usually applied every day, with major releases taking place around three times per year.
"This idea of decoupling our release cycle from the release cycle the IT department has to deal with is probably one of the biggest innovations we have given to our customers," Kelman says. What allows Salesforce.com to eliminate the traditional upgrade cycles is its "secret sauce": the meta-layer, built on top of the physical infrastructure that manages the way in which the infrastructure supports each application and allows changes to be applied instantly across the board. This meta-layer is built on the company's intellectual property, developed over more than 10 years.
"When you are customising all our applications or building new applications, what you are really doing is putting metadata into a database. Whenever you click on an application that is running our service, or if another system makes a web services integration call, our service will look at your metadata and, in real time, render your application and give you what is unique for you," Kelman says.
The customer retains full control of the service's functionality, while all physical infrastructure and software that manages it are completely abstracted, performing functions such as performance tuning, back-up and disaster recovery - all unbeknownst to the customer.
A multi-tenancy architecture imposes an unusually high barrier of quality on any software the company's developers produce, as everything is applied automatically across the entire infrastructure. "The other side of our multi-tenant architecture is, if we deploy a bug, it is experienced by all our customers at the same time. If we get a mistake out there, it is public to the world instantly and impacts everyone."
The meta-layer also enables efficient use of the company's assets. "We have a fraction of the servers that are required per customer per app than you would have for a single-tenant hosting approach - whether it is inside the firewall or virtual servers in the cloud," Kelman says.
Salesforce.com servers currently live in four datacentres - all of them co-location facilities from commercial providers, says Claus Moldt, the company's vice-president of technical operations. Of the four facilities, one is for research and development and is located in the San Francisco datacentre operated by 365 Main. This facility is one of the five properties wholesale datacentre provider Digital Realty Trust bought from Rockwood Capital in June. The remaining three sites are running production environments and are hosted at Equinix datacentres in Silicon Valley, Virginia and Singapore. The company is expanding its datacentre footprint this year, adding wholesale 1MW suites from DuPont Fabros in Virginia and Chicago. The new sites are due to come online before the end of the year.
To serve its 77,000-plus customers and process an average of 300 million requests per day, Salesforce.com has 1,500 primary servers and another 1,500 identical machines for disaster recovery. "I am very, very certain, should we have a disaster, that things may not operate exactly the same if you do not have exactly the same equipment," Moldt says. "That is the only way you absolutely can know."
This is another part of the provider's value proposition, he adds. Setting up a virtual environment and then replicating it for disaster recovery services would be costly for customers to do themselves. The provider allows them to avoid that expense by replicating the entire infrastructure.
The three production locations do not currently operate as disaster recovery (DR) sites for each other, with few of the services running out of DR datacentres being active. "We most likely will head there toward the end of the year," Moldt says. The goal is to eventually have all services run as "active" in each of the facilities.
Keeping up with growth
Moldt has a stringent capacity-planning routine. Every week he does a 12-month forecast with his IT organisation. "Every week we know what we think will happen 12 months ahead," he says.
His team does a three-year forecast every quarter. "We always understand how the growth is happening. I can be fairly accurate in my prediction as we forecast growth over an extended period of time. I can put additional infrastructure in place well in advance and move things around, and add datacentre space or power, because we understand our environment so well," says Moldt.
Ultimately, Moldt wants to be able to lose half of the infrastructure during a peak-usage period and still operate without capacity losses on the main infrastructure. The goal is to have no single point of failure within the base infrastructure itself, while also having an exact replica of the base infrastructure in DR sites.
About 40% of the company's production infrastructure is currently utilised at peak times and Moldt works to prevent utilisation from ever going over 50%.
The meta-layer plays a key role in managing utilisation. A significant amount of compute cycles becomes available during night time, for example. The architecture allows for massive parallel requests to be run during off-peak hours, and some customers are allowed to use the available resources to do patch-up loads. Controls are in place to prevent one customer from monopolising the extra resources, however. Each customer gets an equal share of these extra resources.
Since 2005, Salesforce.com has been expanding IT capacity by adding standard point of deployments (PoDs) to the infrastructure, each of which consists of application and database servers and serves a set of customers. Besides horizontal expansion, the PoDs can be expanded vertically by installing additional servers without architectural changes.
"I can choose to go to 48-core versus 24-core as part of the database, and that is the benefit of the x86 architecture. It is so flexible," says Moldt.
Location is everything
Location decisions are driven by sales and marketing considerations more than by anything else. Datacentres that Salesforce.com chooses to use must be carrier-neutral and must have tier-three or tier-four redundancy level in the electrical infrastructure.
They must also be near major internet hubs. The company has about five transit providers that can route traffic via optimal paths anywhere around the world, and multiple MPLS (multi-protocol label switching) backbone providers.
"They operate as a mesh," Moldt says about the production datacentres. "We have, basically, MPLS backbones, so I could choose to put my infrastructure anywhere just based on capacity needs and on how I want to structure the infrastructure itself."
Moldt explains that what they are looking for is "where we think is the safest place that still meets our criteria". The effect of physical distance on response time does not play a role.
Building a cloud architecture
In Moldt's opinion, the difference between building an architecture to support a traditional service and building one to support a cloud service is in the cloud environment's flexibility needs. A typical service provider looks at each customer separately and builds an architecture according to their needs.
"And then you hope you are right, because you truly don't understand what is the capability and capacity until you start running it. You can model it well in advance, but until you start running it you truly don't know," says Moldt.
A platform that supports a cloud-based service has to be able to support any type of application on the same infrastructure. Salesforce.com's meta-layer determines how to run each application, providing the needed flexibility. The company's production environment does not use virtualisation in the traditional sense of spinning up application servers. "That is just merging capacity," Moldt says.
"With the small number of devices we have [in production], we do not see a significant benefit from virtualisation."
Virtualisation is used heavily in its traditional form in the research and development datacentre for development and testing, with about 8,000 virtual machines spun up daily.
Within the production environment, virtualisation is only done on the back end, in the database layer. Once a customer signs up, they are allocated a slot in one of the multi-tenant databases. When they send a request, it can be sent to any of the application servers associated with that particular database.
"We manage the capacity and, frankly, whether it is virtualised or not does not really matter," Moldt says. "We can scale up and scale down, so we have built in some level of elasticity into the basic functions themselves." Infrastructure running the CRM product is similar to that running the application development platform Force.com. While the two are separated, they act exactly the same.
This article was originally published in DatacentreDynamicsFOCUS magazine. DatacenterDynamics 2010 London conference - Designing for Demand - takes place on 9 and 10 November 2010 at the Lancaster London Hotel. On day one Frank Guerrera, vice president of technical operations at Salesforce.com will deliver a keynote presentation on "Implementing a demand-response datacentre and infrastructure strategy".