For some time, I've been a big advocate of virtualising vCenter, VMware's management system. In fact, I've been doing it all the way back to vCenter 1.0. In 2004 and 2005, VMware's recommendation was that vCenter should be installed to a physical Windows system in production environments, but that was mainly because ESX 2.x.x could only provide 2xVCPUs and a measly 3 GB of RAM.
Then VMware Infrastructure 3 (VI3) came around and the game changed -- it was now completely viable to virtualise vCenter now that a virtual machine (VM) could address 4xVCPUs and 16 GB of RAM. At this stage, likely wary of the highly contentious nature of the debate, the message coming from VMware was that it was neutral on the issue -- VMware would support your virtual infrastructure whether vCenter was physical or virtual. With vSphere 4, that goal shifted again, with VMware now recommending a virtual vCenter as a best practice.
At the heart of this issue was an ideological problem. There are some folks who think that running vCenter on the very system it manages creates a catch-22 scenario that could bite you. This divide persists despite the fact that in vSphere4, a vCenter system can reside inside a VMware High Availability- and Distributed Resource Scheduler-enabled cluster. Heck, you can even enable VMware Fault Tolerance on the darn thing.
When confronted by advocates of "physicalisation," I always try to remind them that running vCenter on a physical box introduces a whole set of anxieties about its availability. Most of my customers accept this argument and realise that if they paid a truckload of money in expensive VMware licensing to get their paws on VMware HA/DRS, then they should utilise these VMware clustering features as much as possible.
Beginning with vSphere 4, the VMware position has shifted again. The official line now is that virtualising vCenter is the recommended configuration and VMware best practice. So there has been a considerable shift from the early days of vCenter 1.0 through to vCenter 4.0.
Virtualising vCenter gotcha
Despite this shift, there's a whopping gotcha at the heart of virtualising vCenter that affects VMware Update Managerpatching automation. You should be able to patch a whole cluster of ESX hosts in an HA/DRS cluster, and "automagically" VMware DRS should move all your VMs off the ESX host, place it in maintenance mode, then reboot it and then do the next. In VMware VI3, the intention was to make patching ESX a single-click event. However, VMware changed this functionality in vSphere 4 without bothering to alert many VMware admins. In vSphere 4, if VMware Update Manager detects that the ESX host it is about to patch is running either a VUM or vCenter, then the patching process halts with an untimely and not especially friendly message.
It's something I discovered whilst writing my book on vSphere 4. At the time, I assumed that this would be a temporary glitch that would change by the time vSphere 4 launched. Unfortunately, it was not, and it's still the case in vSphere 4 Update1.
I recently discussed this with fellow virtualisation blogger Jason Boche on my "Chinwag" podcast. It seems clear that this previous VUM automation functionality has big implications for VMware customers. In our discussion, Boche made it clear that it essentially means a VMware admin has to manually move vCenter off the ESX host in question and trigger the remediation process on a per-ESX host basis -- that's not much fun if the cluster in question has the maximum of 32 hosts. The days of right-clicking the cluster remediating the entire host is over for time being. The rumor is that VMware changed this functionality very close to the release of vSphere 4, so with luck they will change this "by design" decision in subsequent releases.
How to work around it
In the meantime, for workarounds, you have a number of options. You could run vCenter in a separate cluster in a different vCenter instance -- clearly an expensive option if you don't have that configuration already in place. Alternatively, vCenter could be virtualised onto a dedicated management server, but you would lose the precious protection of HA/DRS. It seems that in the short term, folks will resort to using PowerCLI scripts to move the virtual vCenter to another host prior to triggering remediation -- so much for set-and-forget, point-and-click patch management.
All this seems quite ironic given VMware best practices are now at odds with VMware's own patch management system. Some people might argue that it's a case of the left hand not knowing what the right hand is doing within VMware. My fear is that it will give the small army who are anti-virtual vCenter an excuse to advocate "physicalisation."
ABOUT THE AUTHOR: Mike Laverick is a professional instructor with 15 years experience in technologies such as Novell, Windows and Citrix, and he has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualisation website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.