markrubens - Fotolia

How resilient is the public cloud?

The recent downtime that affected some Azure services following an update illustrates the precarious balance CIOs face in the cloud era

This article can also be found in the Premium Editorial Download: Computer Weekly: The future of networking

The fact that some Microsoft Azure services were offline following an update in November illustrates the precarious juggling act CIOs must perform in the cloud era.

While public cloud operators will argue they experience far less downtime than in-house IT, the scale of Microsoft and Amazon Web Services (AWS) means any hitch has the potential to affect millions of people.

By migrating IT services to the public cloud, organisations can benefit from lower IT infrastructure costs, free up resources in their own datacentres and gain the flexibility to support spikes in demand.

Cloud is irresistible

Analyst Forrester predicts Microsoft will make more profit from cloud than from on-premise software. In fact, Windows Server 10 – the next release of the company’s server operating system – is being positioned as a cloud operating system.

In the Forrester report The days of fighting the cloud are over, analyst James Staten urged CIOs to assess how application integration can be achieved on Azure. 

“Capitalise on opportunities where you can better leverage Microsoft’s latest innovations to deliver greater business value faster,” he said.

Now, it would seem, is a good time to get a good deal from Microsoft, especially if the purchase involves Azure or Office 365. But, as the incident on 19 November has highlighted, Azure can fail – and will fail. This most recent downtime occurred as a result of a software bug that was accidentally rolled out across several Azure regions due to human error.

“Unfortunately, the issue was widespread, since the update was made across most regions in a short period of time, due to operational error, instead of following the standard protocol of applying production changes in incremental batches,” Microsoft corporate vice-president Jason Zander wrote on the Azure blog.

Cloud availability presents CIO challenges

Despite contracts that offer “three nines availability” or higher, attaining a highly resilient cloud service is hit and miss.

“In my experience, IT managers expect the cloud service provider to cater for resilience,” said Geoff Connell, director of ICT at oneSource, the Havering and Newham councils shared service.

Connell recently bought a cloud enterprise resource planning (ERP) hosted service from Skyscape, via Capgemini, on behalf of seven London boroughs, but he admitted high availability has been somewhat problematic. “Although it is relatively early days, we have not been impressed with the resilience so far,” he said.

But when Oracle hosted the councils’ ERP in its Oracle on-demand service out of Houston, Texas, availability was solid.

“Oracle’s arrangements to failover in the event of any service outage was excellent, resulting in really strong performance and availability,” said Connell. “Ultimately, there’s no substitute for making sure the tender specification or the cloud service specification fully covers the service requirements.”

Prepare for cloud failure

Getting a straight answer on best practices for migrating in-house IT services across multiple cloud providers to minimise disruption in the event of a failure remains ambiguous.

Ovum principal analyst Michael Azoff said users want easily transferable workloads – not just across a single provider’s datacentres, but across many public cloud providers, so they can pick the best service.

“Even within a single cloud provider we see examples where a business user has no failsafe strategy of balancing across different datacentres. Of course, that does not help where an error affects the total service,” he said.

Azoff suggested the approach CIOs should take is to expect downtime and focus on what backup they have in place.

Challenges of mirroring PaaS

In theory, it should be easier to replicate infrastructure as a service (IaaS), since technology such as VMware vMotion and Hyper-V support live migration of running virtual machines. A business continuity service could be run on another public or private cloud to provide resilience if the main IaaS provider experiences an outage.

But IaaS represents an entry-level commodity, where public cloud providers beat each up on price. The real value is in platform as a service (PaaS). Amazon has been steadily fleshing out its AWS platform to entice developers to build AWS-optimised applications. Splunk, Pegasystems and Informatica were among a number of independent software companies that used the AWS re:Invent 2014 conference in Las Vegas to announce that they are building applications on top of AWS.

Microsoft’s strategy is a common Windows programming model that spans on-premise and the public cloud, giving in-house developers and third-party software providers a way to incorporate Azure into their cloud applications.

While applications built on top of public cloud application programming interfaces (APIs) can benefit from the deep integration available from a PaaS, the risk is that the environment is far harder to mirror in the event of a failure.

Room for cloud improvement

“We’re still in Cloud 1.0,” according to IDC consulting manager Andy Buss, “with few applications and services truly architected and written for the cloud with associated engineering dedicated to link failures and assumptions of fault tolerance. I expect platform and software as a service to get steadily better as more apps are written with these ‘trust nothing’ approaches in mind and reliability is assumed to be dodgy.”

The Windows Azure Pack offers datacentre managers Azure-like systems management for on-premise private cloud deployments.

But clearly it is not practical to run an identical mirror of public cloud-optimised applications on-premise.

So where does a CIO go for fault-tolerant cloud computing?

Under the leadership of its new CEO, Satya Nadella, Microsoft has aligned its strategy around cloud and mobile. The Azure downtime is clearly more than just a hitch – it is deeply embarrassing for Nadella and undermines the principles he has been pushing for Microsoft since he took over earlier this year.

But until Microsoft and other cloud providers offer a service level agreement with zero downtime, CIOs will have to continue to juggle the complexities of attaining high availability whenever there is a cloud element in their IT infrastructure.

Read more on Cloud computing services

Data Center
Data Management