Editor's note: This is part 1 of Mike Laverick's discussion on virtual disaster recovery and VMware Site Recovery Manager (SRM). Part 2 explains the many features of VMware Site Recovery Manager.
Read Part 2: VMware Site Recovery Manager features explained
When I first became involved in virtualisation back in 2003 and 2004, people used to ask me if anyone actually used virtualisation in a product. Most of the time, their plans were to initially restrict its usage to test and development environments and disaster recovery (DR). Quite rightly, I guess, they saw these environments as representing low-risk areas where they could adopt the technology.
However, when they said this, I would always give them a wry smile. Partly because I heard the statement so often, and partly because secretly I knew that once they had virtualisation in these sectors, it was inevitable that they'd soon have virtualisation in the production space. I knew this for two main reasons: first, the technology is so good that you would be bonkers not to want to use it, and secondly because once you have a virtual DR location, it becomes a quid pro quo that you will want the production location as virtual as possible.
Back then, people would play around with the idea of endlessly moving their physical production facility to the virtual DR location. Over time, the physical-to-virtual (P2V) tools have metamorphosed into availability tools, prompted by a long-time decline in the core usage of P2V.
Things have certainly moved on: Companies have adopted so-called "virtualisation first" policies, and many IT shops are breaking though the psychologically significant barrier that is having a majority of virtual environments.
The importance of SRM
It's also so much easier keeping your protected site in sync with your recovery site if you adopt storage array replication from the likes of EMC or NetApp. The early adopters had to use scripting work to automate this process, until VMware released their Site Recovery Manager (SRM) technology. I really love the SRM product, and I wouldn't have written a book about the subject if I didn't think it had a long-term future.
At the moment, I'm re-writing the book to produce one on SRM 4.0 compatible with vSphere4, which was released late last year. And yes, there's even talk of a book after that. That said, the product is not without its "challenges," which I would like to discuss here, along with suggesting some thoughts about where I see the product going in the next couple of years. Personally, I see the SRM 4.0 release as essentially a maintenance release, designed to bring the original product into line with the vSphere4 release. The interesting things should be coming in the SRM 4.1 release.
One of the main barriers to adopting VMware SRM is the sheer cost of the stack required to make it work. You need vCenter, ESX hosts and SRM in the recovery site. I know this because I once asked my user group in London, which represents some of the top international companies operating in the "Square Mile," if anyone was using the product, and if they weren't, why not. Without fail, the vast majority said the cost was prohibitive. I dare say the reason is not so much the actual price but the fact these people have already invested emotionally and psychologically in their own home-brewed scripted solution.
Nevertheless, if you are building a classical active/passive solution where the recovery site is only used for DR purposes and is most likely a collocation facility some distance from the protected site, the vCenter and ESX hosts have to be licensed as if they were taking a regular production load. Some organisations can justify the cost of this by switching to a bi-directional model where the two sites are both production locations offering DR resources to each other.
But what if that isn't feasible from DR, replication bandwidth or logistical restrictions? If I have two primary locations, one in London and the other in Istanbul, the bi-directional answer is not really going to help. It's not uncommon that people license to one level in the production location (Enterprise+ for example) but adopt a lower-SKU at the recovery site to try and save some precious dollars.
What I would love to see VMware do is modify their licensing model to allow for an all-you-can-eat approach to running vCenter and ESX. Of course, some might argue that could lead to abuse, but the VMware model for licensing is "trust"-based already, and most corporate due diligence and audits should pick up on abuses of that trust pretty quickly. Let's face it, most licensing shenanigans happen at the individual and SMB layer, not the kind of people who might instantaneously take DR to the levels that we do.
Shared site model
The new version of SRM supports a shared site model, where many production locations can be protected by many SRM servers running with just one vCenter/ESX host in the recovery site. It's a great feature, and again might help some (but not all) organisations reduce their licensing costs…so long as we don't get too paranoid and start thinking about multiple site failures in multiple locations occurring at the same time. The only downside currently of the shared site feature is to get to it you must use custom switch on the SRM installation using:
Why this is integrated into the installation is anyone's guess, but I have a feeling that VMware sees this functionality as being there for large hosting providers who will host many recovery sites using the same computer power. It's a very small gripe on my behalf, because I think the installer should be challenged during the install if they are configuring a unidirectional (active/passive), bi-directional (active/active) or shared site (one-to-many) configuration.
What I would love to see VMware do is modify their licensing model to allow for an all-you-can-eat approach to running vCenter and ESX.
Mike Laverick, Contributor,
Duplicating and backing up recovery plans
One missing link in the current SRM product is the ability to backup and restore your recovery plans to separate files. You can export your recovery plans for documentation purposes to many formats (.xml, .doc, .html), but there exists no similar "import" feature.
It's also impossible at the moment to copy an existing recovery plan and use it as the basis for another. Why is this so important? Well, ordering virtual machines (VMs) in a recovery plan can take time, and the user interface (UI) to do so isn't especially slick (especially if you have many VMs).
Additionally, if you delete protection groups, which the objects that map your replicated data stores containing VMs to the SRM product, the result is all the VMs listed in the recovery plan are removed -- undoing all your hard work.
If you do this accidentally, then you might find yourself using a backup to rollback the SRM database. For me, that's the nub of the issue -- everything to do with SRM is in the SRM database (normally Microsoft SQL Server 2005). What SRM really needs is a way to backup and export/import that allows the SRM administrator to store the metadata that makes up the SRM configuration in a more readily accessible format.
ABOUT THE AUTHOR: Mike Laverick is a professional instructor with 15 years experience in technologies such as Novell, Windows and Citrix, and he has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualisation website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.