Many people think the key to the virtual data centre is about getting as many virtual machines (VMs) on a physical server as possible. It is still the case, however, that prohibitive memory costs are a roadblock to high consolidation ratios, and virtualisation availability tools still lag behind business need. They have not yet cancelled out the "eggs-in-one-basket" scenario that virtualisation endemically brings with it.
If you spend some time reading performance and scalability marketing literature and white papers from the major virtualisation vendors such as VMware, Microsoft and Citrix, you would assume that the be-all and end-all of the virtual data centre is the virtual machine consolidation ratio. It's quite clear why: People often drill down on this issue, as it's all about driving down cost. The more VMs you get on a single physical server, the cheaper the cost is going to be.
VMware's real objective
It seems not a week goes by that VMware doesn't argue that some feature of its virtualisation platform, vSphere4, offers some unique technology to increase the consolidation ratio of virtual machines to physical servers. It could be that their hypervisor (VMware ESX) supports more physical RAM than any other. Or maybe it's that the adoption of unique features such as Transparent Page Sharing (TPS) or Memory Over-commitment will up the all-important consolidation ratio on their platform when compared to their competitors.
The real agenda behind these arguments, of course, is VMware's attempts to fight back against the PR engines of Microsoft and Citrix that constantly attack VMware's perceived Achilles heel of cost. What VMware wants people to think is this: Yes, on paper XenServer or HyperV may be cheaper to license, but you will need to buy more servers to get the same concurrency of virtual machines. VMware is more cost-effective, the argument goes, because you get greater densities of VMs to host. And indeed there are some merits to this argument.
But, of course, such a narrow focus does skew the bigger picture and I believe that between them vendors run the risk of creating an elephant in the room. The elephant in the room is that server consolidation ratios are more or less meaningless if one doesn't realise how expensive the memory required to get the next generation of consolidation ratios actually is, and that currently the availability technologies offered by the virtualisation vendors make getting those high consolidation ratios in place a challenge.
How many VMs fit on a server?
One of the most common questions I've been asked in recent years is, "How many VMs can you get on one physical server?" The answer is always, "it depends."
It depends on your workloads, and it depends on how much physical RAM you have. Ignoring the benefits of memory over-commitment for a moment, the bottom line is that most people run out physical memory before they run out of CPU cycles in the virtual data centre. The more RAM you have, the more VMs you get, and although memory-commitment allows us to squeeze the maximum efficiency from the physical layer, no one actually wants swap activity to take place. As time has gone by, people have looked for sweet spots of n GB of memory to n cores.
When a vendor trumpets the fact that their hypervisor supports 1TB or 2TB of physical RAM, take a step back and ask yourself – would you ever get budget approval for that specification?
Mike Laverick, Contributor,
The price you pay for memory
Right now, the sweet spot still appears to be 4GB sticks of memory and filling all the banks of the host. Looking at the cost of 8GB memory, it simply doesn't stack-up from an economic perspective for many organizations…yet. So when a virtualisation vendor trumpets the fact that their hypervisor supports 1TB or 2TB of physical RAM, take a step back and ask yourself the question, "Would you ever get budget approval for that specification?"
For example, the latest HP-branded memory currently ships at £163 for 4GB, whereas 8GB is priced at £740. If you want 16GB sticks of RAM, then that's shipping at £1,249. It doesn't take much of a mathematician to multiply the number of banks you have by the memory sizes, and to work out that a strategic use of scale-out is more economic than scale-up.
This problem of unattainable consolidation ratios isn't just restricted to the current cost of memory. It exposes a critical anxiety at the very heart of the virtual data centre project. Quite naturally, people are worried that if a host has too many VMs on it then performance might degrade, and what the impact of the physical failure of a host will have on service delivery.
Unfortunately, organisations can't get away from this entirely. As the eggs-in-one-basket anxiety that comes from virtualisation grows, so does the number of VMs on a host. I've often asked my new clients how many hosts they have, and how many VMs on average reside on them. I've often been surprised and disappointed at how low the consolidation ratio is.
Why are VM consolidation ratios low?
The cause of this can be legitimate, with engineers being over-conservative and running only eight VMs when they could easily be running double that amount. There's an element of "cover your back" at play here.
In some respects, people like myself have been responsible for creating this knee-jerk conservatism. Overly focusing on the "low hanging fruit" (virtualisation candidates in the early years of the virtual data centre) has made people hang back from virtualisation workloads and consolidation ratios that are well within the capabilities of a modern hypervisor. Of course, focusing on the easy candidates to virtualise was just a piece of political pragmatism at a time when many with IT shops were hostile to the very concept.
Availability over performance
I firmly believe that with the right hardware specifications, 99% of x86 workloads can be virtualised. My anxiety is no longer performance but availability. In recent years, I have seen a gap opening between the high consolidation ratios and the availability technologies currently available. Most virtualisation vendors have some kind of clustering technology like VMware HA, which will restart a VM in the event of a host failure. But it is just that -- a restart.
There's no promise of what state the VM might be in when it is powered on, and what consistency checking may be needed by the application. If applications are multi-tiered with many service dependencies, a restart of just one VM may not be that helpful at all. All this adds up to a service, which may or may not be available. For this reason, I think such HA technologies are really only useful to applications that come with some type of application availability built-in to them and scale-out for both availability and performance such as Microsoft Active Directory, Terminal Services and Web Servers.
What continuous availability has brought us
To some degree, the limitations of these types of technologies have been addressed by the arrival of continuous availability, the shape of VMware's Fault Tolerance. With VMware FT, the "primary" VM on one ESX host is mirrored in real time to a "secondary" on a different host. Literally everything that happens in one VM is reproduced on the other.
It's a fantastic technology, and for once, VMware played the ace card by including it in more of the lower SKUs. It's great for single stand-alone VMs, which need better than average availability, and cannot be protected by other methods from a technical or cost perspective. VMware FT is terrific, and it's a surprise to me that VMware doesn't crow more about in the marketing and PR literature. It's real bleeding-edge technology that literally demonstrates how far ahead VMware are compared to their nearest competitors.
Why VMware FT needs a little help
However, you'd be wrong to think that VMware FT is a silver bullet. It requires the latest CPUs with the "vLockStep" attribute, and it is limited to just a single vCPU inside the VM. Currently, a FT-enabled VM will lose some of the sweet things that virtualisation brings to the table, such as hot-backups of the VM without the need for in-guest agents. It's also currently incompatible with some of the automation that VMware brings in the shape of DRS, DPM and VUM.
Additionally, VMware FT has scalability issues -- it simply isn't possible to protect every VM with VMware, and it might not even be economic given the overhead VMware FT imposes. True, in the future these incompatibilities are likely to be engineered out of the vSphere4 platform. But let's deal with where we are now, rather than a roadmap.
No, the real problem with VMware FT is the old problem of "garbage in equals garbage out." Computers cannot give the right answer if they have the wrong data. If there is a problem with the "primary" VM, in terms of the services or operating system, these are merely duplicated in the "secondary" VM. A blue screen of death (BSOD) in one would mean a BSOD in the other.
Currently, VMware has no offering that would address these availability issues in the realm of the guest operating system. One could argue that, as a virtualisation vendor, it isn't their natural territory. However, their competitors (Microsoft) do have offerings in place in the stack in the shape of Microsoft Clustering Services (MSCS) technology. It's worth saying that for some time these "classic" or "legacy" methods of delivering availability were incompatible with the newer methods. We are only now beginning to see these competing technologies grudgingly recognise each other.
As you can see, we've come a long way from our starting position, which was that server consolidation is the be-all and end-all of the virtual data centre. The important factor remains the thorny issue of availability. Clearly, clustering technology at the virtual layer reduces costs and complexity, but the classic methods cannot be dismissed lightly, whether it is MSCS or products like NeverFail. The challenge for the virtual data centre is fitting the right technology to deliver the required level of availability. On our journey to the cloud, I believe a Rubicon (a point of no return) will have to be crossed. If the cloud is going to deliver scalability and availability, then some of the virtualisation vendors may be forced to consider the service and application layer as well, rather than leaving that to their competitors.
ABOUT THE AUTHOR: Mike Laverick is a professional instructor with 15 years experience in technologies such as Novell, Windows and Citrix, and he has been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualisation website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.