Disaster recovery is vital in software-defined datacentre era

Jon Toigo argues against virtualisation advocates that say the software-defined datacentre, with its high availability and clustering, does away with the need for disaster recovery

The rise of the software-defined datacentre has been accompanied by the emergence of the myth that there is no need for disaster recovery provision, and that clustering and high availability can take its place.

That was the message from Jon Toigo, who spoke at TechTarget Storage Decisions Disaster Recovery seminar in London earlier this month.

Toigo, CEO and managing principal of Toigo Partners International and chairman of Data Management Institute, said the software-defined datacentre was akin to the mainframe.

Software-defined datacentre myths

Toigo was scathing about the claims made around software-defined datacentres, arguing that many of the purported benefits are lacking in substance and that virtualisation often creates as many problems as it says it will solve.

“They dream of a return to the era of the IBM mainframe, but with a VMware label instead of IBM,” he said. He also outlined three key “lies” propagated by suppliers:

  • Virtualisation will solve all infrastructure problems.
  • Disaster recovery is no longer necessary with clustering and high availability.
  • That suppliers are working in customers’ best interests despite the implicit assault on proprietary values in software-defined datacentre methods.

“Software has always defined my infrastructure,” said Toigo. “When has it not?”

“Now we have pools of compute, networks and storage capacity, but these need automation and management and we find that all the same functions that were included in say, the NIST service delivery layer for legacy datacentres, are now in the virtualisation software management layer of the software-defined datacentre,” said Toigo.

Software-defined disruption to storage and networks

Chief among the knock-on effects of virtualisation, said Toigo, have been disruption to storage and networks.

“The software-defined datacentre doesn’t change the underlying infrastructure,” he said. “It just masks our ability to manage it. Hardware is still the foundation and we’re paying no attention to monitoring it. The virtualisation ecosystem is based on fixing problems that VMware creates.”

More on disaster recovery

Toigo pointed to power savings and greater server utilisation, for example, being eaten up by unexpected extra costs for storage capacity, increased I/O contention and congestion in networks and storage fabrics.

When it came to software-defined storage, Toigo had some harsh words for VMware’s VSAN initiative, lambasting it for encouraging a move back to direct-attached storage (DAS) after IT departments spent a decade or more moving to shared storage and for the effective lock-in that results from VSAN only working with VMware workloads.

Disaster recovery for the software-defined datacentre

Despite all the changes wrought by the software-defined datacentre, what hasn’t changed, said Toigo, is the nature of disaster, and this was the main practical focus of the seminar.  

A disaster, he said, is usually something quite mundane that has escalated.

“Often it’s IT taking its eye off the ball – a hardware failure or software issue during patching, for example. What makes an everyday event, an inconvenience, into a disaster is not being prepared and letting an unplanned interruption go on for an unacceptable period of time,” said Toigo.

“Those that develop continuity plans fare better than those that don’t,” he added.

Applications differ in terms of criticality to the business, their tolerance to downtime and priority of restore

Toigo lambasted the presumption of full redundancy between sites, with high availability and failover between them, that is the favoured approach of software-defined datacentre and virtualisation advocates who say it makes disaster recovery redundant.

Sure, said Toigo, high availability – where data recovery, re-hosting and network reconnection – are baked into an automated infrastructure is a very good thing.

But, he added, this is also the most difficult to achieve due to a gap between the promises of high availability and the reality.

“Failover doesn’t happen as smoothly as they say it does. Lots of data needed by an application doesn’t just come from one place. There is configuration data, log files, databases etc that an app needs to work when you’re re-hosting it. There can also be problems with network reconnection.”

In other words, to make it work you have to spend lots of money on setting it up and constantly testing it to ensure a smooth transition between datacentres. He asked, is it appropriate to spend this kind of money ensuring fully automated high availability for all apps?

“It assumes all apps are equally important and that all data must be mirrored across networks and that all of them need mission critical, always-on support,” said Toigo.

Not all data is created equal

Instead, given that for most organisations fully active-active datacentres with failover are out of reach, Toigo recommended an approach in which data is classified and storage is tiered, and data is restored according to its level of priority to the organisation.

“Applications are not all the same,” said Toigo. “They differ in terms of criticality to the business, their tolerance to downtime and priority of restore.”

His recommendations include that:

  • A copy of data is made and the type of media it is retained on reflects the criticality of the app the data supports; flash storage, performance disk, capacity disk and tape.
  • That the way data is protected matches application criticality.
  • That recovery shouldn’t commit users to a specific hardware platform and that re-hosting should utilise the minimum possible hardware configurations.
  • That reconnected networks should deliver baseline bandwidth and throughput.
  • That disaster recovery strategies should as easy to test as possible.
  • That disaster recovery capabilities should allow for the quickest possible restoration of critical apps in the budget allowed.

Read more on Disaster recovery