Editor's note: This is the second part of Mike Laverick's discussion on virtual disaster recovery and VMware Site Recovery Manager (SRM). Part 1 covered licensing costs, shared site model and duplicating and virtual disaster recovery backup plans.
Read Part 1: Virtual disaster recovery with VMware Site Recovery Manager
Command steps/PowerCLI support
In the VMware Site Recovery Manager (SRM) product, it's possible to create steps that call-out other scripting engines, such as Microsoft PowerShell and VMware's PowerCLI. It's a must-have feature if you want to have a recovery plan that reflects all nuances of your organisation. The trouble with the command step piece, however, is that you can only place them in certain parts of your recovery plan -- the main steps. They lack the granularity that allows you at any point to add a call-out to script.
Furthermore, the authentication credentials that trigger the script's execution is the local administrator account of the SRM host. Some organisations might have problems getting that through their security audit, and it also means you have to look at ways of storing vCenter credentials with your PowerShell scripts. On that subject, currently there are no PowerCLI cmdlets for SRM. Fortunately, I know from the Product Manager of PowerCLI, Carter Shanklin – that VMware is considering this. My hope is that VMware will PowerCLI enable all its top-tier management products (Lab Manager, SRM, View) just like Microsoft has with theirs.
I think the long-term strategy of VMware will be to take the R out of SRM.
Mike Laverick, Contributor,
To re-IP or not to re-IP
Site Recovery Manager (SRM) does contain a way to re-IPing virtual machines (VMs) as they are recovered in the recovery site. As possibly the network of the recovery site will not correspond to the network of protected site, so hence the re-IP. The SRM re-IP processes piggyback off vCenter's "Guest Customization" using Microsoft Sysprep as the engine to re-IP the VM.
In SRM 1.0 Update 1, VMware added a bulk-administration utility called dr-ip-customizer to handle this via importing settings from a CSV file. It's quite a cute method. The bottleneck, however, is still the Microsoft Sysprep engine; it just wasn't designed for the purpose that VMware is using it for. In fairness to VMware, they were most likely forced to adopt this approach because of support restrictions from the Microsoft side of the fence. I think a scripted method that would use a CSV file together with Microsoft PowerShell or netsh would be a slick approach.
Perhaps I'm making more of this than I really should, because generally I believe re-IP-ing VMs is a bad idea. I think organisations are better off looking at the case for using NAT, stretched vLANs or adjusting routing tables to handle this issue. On the VMware side, I would like to see them use their vShield technology or cross-network-fencing (which appears in Lab Manager) to offer customers a way of bringing up a VM in the Recovery Site without a re-IP while reducing or eliminating the need for the "VMware guys" to make requests of the "network guys."
Generally, SRM reacts very well to changes in the virtual data centre, changes such as renaming folders, resource pools, virtual machines and so on. This is an important fact, as one of the parts of the SRM workflow is "inventory mapping" vCenter objects in the protected site to objects in vCenter in the Recovery Site.
There are some changes that take place in vCenter, however, that SRM doesn't handle that well. This has been an issue since SRM 1.0. A case in point is using cold-migration and SVMotion. Whilst moving a VM from non-replicated storage to replicated storage works seamlessly, currently SRM doesn't cope as well when a VM is relocated from one replicated datastore to another or away from a replicated datastore to a non-replicated datastore. Whilst workarounds for these issues do certainly exist, I had hoped they would be resolved with SRM 4.0.
One of the issues some organisations had with SRM 1.0 and will have with SRM 4.0 is the lack of an automated failback. Now, let's gets something clear from the very beginning. You can do failback with SRM. Once you have done it a few times the process is actually quite easy. It just isn't next-next-next wizard. For some people this lack of automation has deterred them from deploying SRM. For some testing their DR plan involves running the DR plan for real, and carrying out the failback for real.
At this moment, the best thing offered are failback plug-ins from the storage vendors. I'm sure the next step will be VMware adding a formal failback process to SRM. I'm not really sure how that will look and feel, as there as many caveats and approaches to failback, as there are with failover, and the same risks if the failback is unsuccessful.
So while triggering your failover when disaster happens is pretty much a no brainer (what choice do you have?), the failback process is likely to take some time. Say, for example, you lost your primary site because of fire. After repair work has been completed, perhaps you decide it is safe to return. That could mean a new server room, new servers and the synchronisation of terabytes worth of data to a new storage array.
The future: Taking the R out of SRM
So where is SRM going in the long term? I firmly believe that some, but not all of the issues I have raised will be addressed by future product releases. The next version of SRM, version 4, is likely to be a big one. I think the long-term strategy of VMware will be to take the R out of SRM. Although VMware Site Recovery Manager began its life as a DR automation tool, it will gradually evolve into a Site Manager tool.
I think VMware will want to roll SM into a technology that facilitates cloud management, allowing you to move some or all of your VMs from an internal cloud to an external cloud and perhaps even assisting in the move of VMs from one external cloud provider to another. If VMware's vCloud Express project takes off, there would be a common platform to do that --; vSphere -- with a common tool to facilitate it -- Site Manager.
ABOUT THE AUTHOR: Mike Laverick is a professional instructor with 15 years experience in technologies such as Novell, Windows and Citrix, and he has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualisation website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.