Server and desktop virtualisation are key technologies driving forward today’s data centres. They bring huge efficiencies to IT that just weren’t possible in an all-physical world. But, delivering virtual machine storage brings big challenges and requires new approaches to ensure resources are optimally delivered. In this article we look at the key methods and technologies used to optimise virtual machine storage for virtual server and desktop environments.
The fundamental issue for virtual machine storage is that virtual server and desktop I/O patterns are unpredictable in terms of when they will happen and often of great volume when they arrive.
Whereas once one physical server drew upon its own dedicated disks, now many virtual servers occupy one box and can often demand I/O simultaneously from shared storage. For example, as virtual machines start up and shut down, they generate large amounts of read and write I/O. This is especially true for virtual desktop infrastructures (VDI), where “Boot storms” occur as many desktops start up or shut down at the same time when workers begin or end their working day.
There are a number of ways in which SAN administrators can improve I/O performance and help mitigate against some of the more common issues associated with virtual machine storage.
On the array: Wide striping and storage tiering
Two key strategies for SAN performance enhancement in virtual environments are wide striping and storage tiering.
Wide striping involves spreading I/O across as many physical spindles as possible. For SAN environments where virtual machine LUNs can be especially large, wide striping is essential to ensure a sufficient number of IOPS can be delivered to virtual hosts.
Most storage vendors offer the ability to aggregate multiple RAID groups of disks into pools that can then be carved into LUNs. Pools of multiple RAID groups are better than one RAID group when implementing wide striping. Recovery time from disk failures is quicker, performance impact is less, and, crucially, the statistical risk of multiple disk failures in the same RAID group -- and therefore data loss -- is much reduced.
Tiering, meanwhile, places data onto differing classes of storage, based on performance needs. High-performance virtual machines demand faster disks or solid-state devices. Test and development machines may be more suited to SATA or cheaper disk platforms.
Dynamic tiering adds another layer of performance improvement, mixing multiple device types within the same disk pool and providing automated features to balance data across disk and solid-state storage, based on I/O demand. Active data is promoted to faster media where required and can be demoted to slower ones when not in use.
Using dynamic tiering at the block level can significantly improve performance for a small incremental cost, as only a small percentage of data is usually ever active within a server at any one time. This workload can be addressed with a small amount of faster disk or flash.
Although not strictly a tiering solution, NetApp Flash Cache can be used to improve virtual machine I/O performance with NetApp filers. This device sits within the filer and acts as a large memory cache for I/O reads from disk. Performance of read I/O is significantly enhanced by serving frequent read requests from the Flash Cache card.
It is possible to use other techniques on the array to improve performance, such as short stroking disks or deploying all-flash arrays. These tend to be more expensive solutions as they are either wasteful on resources or provide additional I/O to data that doesn’t need the performance.
At the server: Adding flash
An alternative to improving array I/O performance is to look at moving storage closer to the virtual machine by placing it into the same physical server as the hypervisor. Shortening the I/O path improves response time and delivers greater throughput.
Products such as Fusion-io’s ioTurbine enable PCIe flash products to act as a server-based cache in VMware ESXi, dramatically improving virtual machine performance.
Of course, there are always compromises with this kind of technology. Data on an internal PCIe flash card is closely associated with the server and is at risk of isolation or loss if a catastrophic event should occur on the server. PCIe flash cards work well in configurations such as VDI, where a master image can be loaded from external disk and cached for read-only access.
Dedicated virtualisation storage products
We shouldn’t forget that there are a number of vendors that are developing software and hardware solutions to specifically address I/O for virtual environments.
Virsto, for example, offers a software solution for Microsoft Hyper-V server and VMware vSphere and VDI deployments that acts as a virtual disk device. Random writes are converted into sequential I/O by storing all updates on a log disk. The process is analogous to the way databases write a sequential log and then asynchronously update the database at a later point in time.
Atlantis Computing’s ILIO (Inline Image Optimisation) product provides acceleration of VDI environments. It is delivered as a virtual storage appliance that optimises I/O between storage and virtual desktops. ILIO has the advantage of reducing I/O so significantly that VDI deployments can be placed on standard SATA drives. Atlantis claims savings of up to 90% of I/O can be achieved.
Tintri offers a hardware-based optimisation solution called VMstore. This uses a mixture of solid-state and SATA drives with data held in VMware data storage rather than LUNs to deliver what is described as “VM aware” storage. It’s worth remembering that for VMware, a virtual machine is simply a number of files. By optimising the file access, performance can be greatly enhanced.
Nutanix has taken a hybrid approach with its Complete Cluster offering. This combines processing and storage into one device, with each server in the cluster operating as a hypervisor and storage array. Data is stored on a mixture of flash and hard drives. The hybrid approach removes the need to deploy a dedicated SAN but could suffer from scalability issues when trying to balance the right amount of storage versus computing power. In addition, the storage resources are not shareable with other platforms.