The rise of the software-defined datacentre has been accompanied by the emergence of the myth that there is no need for disaster recovery provision, and that clustering and high availability can take its place.
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Software-defined datacentre myths
Toigo was scathing about the claims made around software-defined datacentres, arguing that many of the purported benefits are lacking in substance and that virtualisation often creates as many problems as it says it will solve.
“They dream of a return to the era of the IBM mainframe, but with a VMware label instead of IBM,” he said. He also outlined three key “lies” propagated by suppliers:
- Virtualisation will solve all infrastructure problems.
- Disaster recovery is no longer necessary with clustering and high availability.
- That suppliers are working in customers’ best interests despite the implicit assault on proprietary values in software-defined datacentre methods.
“Software has always defined my infrastructure,” said Toigo. “When has it not?”
“Now we have pools of compute, networks and storage capacity, but these need automation and management and we find that all the same functions that were included in say, the NIST service delivery layer for legacy datacentres, are now in the virtualisation software management layer of the software-defined datacentre,” said Toigo.
Software-defined disruption to storage and networks
Chief among the knock-on effects of virtualisation, said Toigo, have been disruption to storage and networks.
“The software-defined datacentre doesn’t change the underlying infrastructure,” he said. “It just masks our ability to manage it. Hardware is still the foundation and we’re paying no attention to monitoring it. The virtualisation ecosystem is based on fixing problems that VMware creates.”
More on disaster recovery
- Setting IT disaster recovery policy and developing plans
- Conducting a tabletop exercise for disaster recovery testing
- Disaster recovery test frequency
- Business continuity and disaster recovery policy statement templates
- Choosing disaster recovery options that fit your virtual needs
- Call center disaster recovery planning best practices
- Virtualizing business continuity and disaster recovery activities
Toigo pointed to power savings and greater server utilisation, for example, being eaten up by unexpected extra costs for storage capacity, increased I/O contention and congestion in networks and storage fabrics.
When it came to software-defined storage, Toigo had some harsh words for VMware’s VSAN initiative, lambasting it for encouraging a move back to direct-attached storage (DAS) after IT departments spent a decade or more moving to shared storage and for the effective lock-in that results from VSAN only working with VMware workloads.
Disaster recovery for the software-defined datacentre
Despite all the changes wrought by the software-defined datacentre, what hasn’t changed, said Toigo, is the nature of disaster, and this was the main practical focus of the seminar.
A disaster, he said, is usually something quite mundane that has escalated.
“Often it’s IT taking its eye off the ball – a hardware failure or software issue during patching, for example. What makes an everyday event, an inconvenience, into a disaster is not being prepared and letting an unplanned interruption go on for an unacceptable period of time,” said Toigo.
“Those that develop continuity plans fare better than those that don’t,” he added.
Applications differ in terms of criticality to the business, their tolerance to downtime and priority of restore
Toigo lambasted the presumption of full redundancy between sites, with high availability and failover between them, that is the favoured approach of software-defined datacentre and virtualisation advocates who say it makes disaster recovery redundant.
Sure, said Toigo, high availability – where data recovery, re-hosting and network reconnection – are baked into an automated infrastructure is a very good thing.
But, he added, this is also the most difficult to achieve due to a gap between the promises of high availability and the reality.
“Failover doesn’t happen as smoothly as they say it does. Lots of data needed by an application doesn’t just come from one place. There is configuration data, log files, databases etc that an app needs to work when you’re re-hosting it. There can also be problems with network reconnection.”
In other words, to make it work you have to spend lots of money on setting it up and constantly testing it to ensure a smooth transition between datacentres. He asked, is it appropriate to spend this kind of money ensuring fully automated high availability for all apps?
“It assumes all apps are equally important and that all data must be mirrored across networks and that all of them need mission critical, always-on support,” said Toigo.
Not all data is created equal
Instead, given that for most organisations fully active-active datacentres with failover are out of reach, Toigo recommended an approach in which data is classified and storage is tiered, and data is restored according to its level of priority to the organisation.
“Applications are not all the same,” said Toigo. “They differ in terms of criticality to the business, their tolerance to downtime and priority of restore.”
His recommendations include that:
- A copy of data is made and the type of media it is retained on reflects the criticality of the app the data supports; flash storage, performance disk, capacity disk and tape.
- That the way data is protected matches application criticality.
- That recovery shouldn’t commit users to a specific hardware platform and that re-hosting should utilise the minimum possible hardware configurations.
- That reconnected networks should deliver baseline bandwidth and throughput.
- That disaster recovery strategies should as easy to test as possible.
- That disaster recovery capabilities should allow for the quickest possible restoration of critical apps in the budget allowed.