Boost vCenter server availability with new tool

Our VMware expert takes an in-depth look at the powerful management tool vCenter Server.

For many years I've been a virtual guy. I haven't installed Windows to a physical machine since 2004. As a consequence, I've only ever run VMware vCenter Server inside of a virtual machine (VM), and I've been an outspoken advocate of doing so for some time.

It's perfectly possible for vCenter Server to reside within a cluster that it manages, and so if the physical ESX host it is running on fails, it would be restarted on another ESX host. In fact, I would say I have been largely skeptical about the use of third party availability tools for maintaining vCenter Server uptimes.

It's worth saying my view is not one shared by many people. Many VMware customers persist in running their vCenter Server on a physical server; according to VMware the number is about 55% to 65% of customers. Back in 2005 that was actually a VMware recommendation, and probably accounts for why many VMware shops persist in this practice to this day. In the light of these realities, customers may be unable to protect their vCenter Server from unwanted downtime using merely VMware clustering technologies like High Availability and Fault Tolerance.

One of the attractions of vCHB over conventional clustering is all the work is done inside the guest operating system.


Mike Laverick, Contributor,

As for me, as time progressed, I've become increasingly anxious that whilst vCenter Server started its life as a relatively simple management tool, it has evolved over a number of years into a much bigger beast -- especially when customers begin using it to run other infrastructure components that are dependent on its availability.

In the past I wouldn't have said that vCenter was a single point of failure, and that is true today. But once you have a virtual desktop infrastructure (VDI) solution in the shape of VMware View, and a data recovery solution in form of VMware Site Recovery Manager, you do begin to wonder what the higher-level infrastructure would be like if vCenter Server failed.

In the past five years, VMware have been investing a lot of their research and development into creating additional management tools and applications that sit atop vCenter Server. In short, if the vCenter Server service(s) or system fails, then these applications also fail -- it has evolved to become a single point of failure from a management and application perspective.

How to increase the availability of vCenter Server
This year I began the process of investigating ways by which the availability of vCenter Server could be increased, and to reassess my attitudes. This recently became more acute with the release of vSphere 4.1, where VMware stated for the first time (since vCenter 1.0 Patch-Level 4) that using Microsoft clustering to provide high-availability for vCenter Server was not qualified by VMware.

This doesn't mean that MSCS won't work, far from it. But if you have issues with a MSCS-enabled vCenter Server installation, you will be on your own from a support perspective.

The position is very similar to Microsoft's early support statements surrounding virtualisation. To gain support from VMware, you would need to demonstrate that the clustering component was not source of the problem, just like VMware customers had to demonstrate that their problems with Microsoft Windows were not caused by virtualisation.

There are many reasons why VMware might take this position. But I think the main one is that MSCS is not their technology, and they perhaps don't want to take the hit to their support costs in helping customers overcome problems with competitors' product, especially since they now have their own availability solution in the shape of an OEM version of NeverFail, which they call vCenter Server Heartbeat (vCSHB). One of the attractions of vCHB over conventional clustering is all the work is done inside the guest operating system. This means a simpler configuration -- there are no shared volumes, quorums and so on to deal with. Additionally, traditional clustering requirements for VMs often creates conflict with other vSphere features such as vMotion, HA and DRS. With vCHB, all the activity is inside the guest operating system so these features work without a problem.

I found the setup and configuration of the vCSHB product to be very easy and I was able to complete the configuration without any need to reboot vCenter Server or claim a maintenance window.


Mike Laverick, Contributor,

vCSHB works by creating a clone of both the vCenter Server and the SQL server, and then keeps both the primary and secondary vCenter Servers in sync through continuous asynchronous replication. The solution supports a setup where vCenter Server is run in a completely virtual configuration, completely physical configuration and hybrid model where the primary might be a physical machine, and the secondary a virtual machine to reduce the cost of n+1 redundancy.

For my purposes, I carried on running vCenter Server in a VM because it was the easiest and quickest way of getting vCSHB up and running. But the steps taken and experience I had would be much the same whether vCenter Server was on physical or virtual. After all, vCSHB sits in the guest operating system.

The service can be configured to work solely on a private LAN or can be stretched across two sites in a so-called "DR Mode." In the DR mode, you have option to use different IP ranges, and have a user account send DNS updates in the event of failover, thus redirecting users and services to a functioning vCenter Server.

The primary and secondary vCenter Servers each have two NICs -- the first to handle the primary incoming traffic (Principal Public Network) from the vSphere Client or from other management systems such as VMware View or VMware SRM, and the second for replication and the "heartbeat" signal, which is referred to as the "VMware Channel." This is used to detect a failure, and trigger a failover. Split brain is avoided by forcing the vCenter Server Heartbeat service to "ping" other nodes on the network such as a router, DNS or global catalog server. In this way, "false positives," where failover happens accidentally, can be avoided, as the Primary vCenter Server has be totally isolated before the failover process can be triggered. Optionally, the vCenter Server Heartbeat service can be configured to monitor and trigger remediation tasks associated with degraded performance issues as well.

Figure 1 (Click image for an enlarged view.)

vCSHB supports real-time monitoring of all the changes taking place on the vCenter Server, it ships all the data changes taking place inside the file system and registry and synchronises them with the Secondary vCenter Server. I found the setup and configuration of the vCSHB product to be very easy and I was able to complete the configuration without any need to reboot vCenter Server or claim a maintenance window. It was heartening indeed that I could install an availability product without needing to claim a maintenance window!

Easier if vCenter is a VM
At the risk of making cheap shot at folks who run their vCenter Server on physical systems, it is even easier still if your vCenter is a virtual machine! The reason is that to create the secondary, you must build a VM or physical machine that is identical to the primary vCenter Server.

If the configuration were for a physical primary and physical secondary, then buying a new server and either cloning the original with disk cloning tools or allowing the vCSHB product to copy the configuration to the secondary. If the vCenter Server is a VM, then it is possible to use the hot-clone feature of vCenter Server to create this "pre-cloned" secondary. I found this approach was much quicker, easier and less prone to error.

Figure 2 (Click image for an enlarged view.)

Note: This cloning process will trigger a reactivation event in Windows 2008, as Windows is design to detect these cloning processes as if it is a new deployment.

If you are running vCenter Server on physical machines, perhaps the quickest way to create a virtual secondary vCenter Server is to use VMware vConverter to carry out P2V of the existing physical vCenter Server. By doing this you guarentee that you end up with two systems that are indentical in every respect.

When I carried out the clone of my primary vCenter Server I took care not to run any "Guest Customisation" against the clone so it would remain a complete identikit of the primary vCenter Server. Additionally, I made sure that the clone was attached to Standard vSwitch with no physical NICs attached.

This was mainly because I was concerned about creating an IP conflict with the primary vCenter Server. With hindsight (and if I'd read the vCSHB Reference Guide more closely!) it would have been easier to simply disconnect the Public Principal Network, and leave the VMware Channel Network connected using the edit settings dialog box. That way I would have been able to disconnect and connect the secondary vCenter Server as and when the setup required it.

The next step I took was to hot-add a second NIC to the primary vCenter Server and configure an IP address for the VMware Channel. I took the additional optional step of renaming the "Local Area Connection" in Windows to reflect the purpose of the different interfaces. The "Principal Public Network" was on a subnet of 192.168.3.x, and the VMware Channel was configured with 172.168.x.y subnet.

Figure 3 (Click image for an enlarged view.)

I created a special port group on my distributed virtual switch (dvSwitch) called "vcha-heartbeat" with teamed vSwitches to ensure that there was appropriate network redundancy throughout. This was used for the VMware Channel network. Of course, vCSHB doesn't require the use of dvSwitches, as it will work with standard vSwitches, and it will protect any version of vCenter Server you care to acquire.

Editor's Note: Read part two of this series vCenter HeartBeat: How to install vCSHB.

Mike Laverick

ABOUT THE AUTHOR: Mike Laverick is a professional instructor with 15 years of experience with technologies such as Novell, Windows and Citrix, and has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualisation website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere 4 and VMware Site Recovery Manager.

Read more on Managing servers and operating systems