Virtualisation and the LUN: Storage configuration for VMs

Virtualisation and the LUN: Storage admins used to match LUNs to physical servers, but that’s all changed. Find out how in this guide to the basics of VM storage

Chris Evans

Published: 19 Jun 2013

Providing storage in a physical environment required storage administrators to match LUN storage partitions to the performance and availability characteristics needed for each server.

But with the advent of server virtualisation, that’s all changed.

Instead, in the virtual environment, storage resources are abstracted not by the carving out of LUNs (logical unit number), but by the virtualisation hypervisor. The LUN still exists, but usually as a single large pool from which virtual storage is assigned to individual guests.

This pooling process means some additional planning and design is necessary by the storage and virtualisation administrators to ensure storage resources operate in a timely and efficient fashion.

Hypervisors emulate storage devices

In virtual server environments, the storage presented to the guest is abstracted from the physical storage and is represented by generic SCSI devices.

In VMware environments, these initially were Buslogic and LSIlogic emulations of parallel SCSI devices and have since been expanded to include faster SAS versions. Hyper-V provides storage to a guest using a similar IDE driver and can use SCSI devices for non-boot disks. Essentially though, the key factor here is that regardless of the underlying storage used by the hypervisor, the host still sees an emulated IDE, SCSI or SAS device connected through a single controller.

Virtual storage tips

Create datastores and physical disk volumes as large as possible.
Apply existing standards on multi-pathing and Raid protection to the physical datastores and volumes.
Group similar workloads together within a single datastore/volume.
Use thin provisioning to pre-allocate storage for virtual guests to the expected maximums required, remembering individual vSphere disks have a maximum 2TB limit (64TB for Hyper-V VHDX).
Implement monitoring and alerting to manage the growth of thin provisioned virtual disks.
Where supported in the guest operating system, use the latest version of virtual SCSI and SAS drivers to get the best levels of performance.
Use physical device mapping to present physical disks to guests that need direct SCSI support.

Device emulation means the physical data of a virtual machine can be moved around the storage in a virtual infrastructure without any impact to the host, however it does pose some limitations. Firstly, there are limits to the size of individual disk volumes and secondly, only standard SCSI commands are passed through to the emulated device.

This can be an issue for servers that need to access array control devices. In this instance, disks can be directly connected without out emulation. In VMware environments these devices are known as RDMs – Raw Device Mapping. The latest release of Hyper-V provides a feature that allows Fibre Channel devices to be connected directly to the guest machine without device emulation.

Virtual disks – VMDKs and VHDs

Virtual disk drives are stored by the hypervisor as files, with effectively one file per guest volume. In vSphere, the files are known as VMDKs (Virtual Machine Disks), whereas Hyper-V stores them as a VHD – virtual hard disk.

Within vSphere, a VMDK can be stored on an NFS share or on a block device (Fibre Channel or iSCSI) that has been formatted with the VMware File System – VMFS.

A single VMDK is limited to 2TB – 512bytes in size, which implies a 2TB limit on all guest volumes. Where a guest needs more than 2TB, the storage has to be presented through multiple logical volumes.

On Hyper-V, the VHD format is also limited to 2TB in size. Microsoft recently released the new VHDX format as part of Windows Server 2012 that allows individual virtual disks to be scaled to 64TB in size.

The physical storage used to hold virtual disks can be either block or NAS devices. VMware supports iSCSI, Fibre Channel, FCoE and NFS. Hyper-V supports Fibre Channel, iSCSI and SMB, the latter sometimes being referred to historically as CIFS.

The type of storage used is transparent to the virtual guest, as is the level of multi-pathing in place. This is all achieved at the hypervisor layer and should follow standard good practices of multiple redundant paths to physical storage.

Matching VMs to storage

Hyper-V and vSphere store virtual machines in larger “containers”.

Hyper-V uses local NTFS volumes or SMB/CIFS file shares. VSphere uses NFS shares or LUNs formatted with VMFS, known as datastores.

Prior to vSphere version 5, the block size of a VMFS datastore could range from 1MB-8MB and represented a limitation in terms of VMFS sizes. The largest 2TB VMFS datastore needed an 8MB block size, effectively resulting in a minimum of 8MB increments assigned to virtual guests. VMFS version 5 (released with vSphere 5) provides for a uniform 1MB increment, regardless of the datastore size. For Hyper-V the block increment is 2MB, irrespective of the formatting of the underlying NTFS file system.

In both hypervisor platforms, the container used to store virtual machines represents the way physical storage is presented to the hypervisor and so means all virtual guests in that container will receive the same level of performance and availability. Therefore vSphere datastores and Hyper-V volumes should be used to group similar types of virtual machines together. This grouping may be for example, production versus test/development guests or be used to provide higher performance storage, such as tier 1 or SSD.

Consideration should also be made for the connectivity of physical storage and how it can affect performance. For example, in Fibre Channel environments, there may be a benefit in having separate Fibre Channel HBAs (host bus adapters) dedicated to high performance storage in order to reduce the impact of contention of lower performance virtual machines in a mixed environment.

Thin provisioning

Both vSphere and Hyper-V provide for thin provisioned virtual machines. By this we mean on-demand growth of virtual machines rather than physically reserving the entire size of the virtual machine at creation time. vSphere provides for “thick” guest volumes in two formats; zeroedthick – in which storage is reserved at creation time and is zeroed out, or erased just-in-time as the host writes to that block of physical storage; eagerzeroedthick – where the storage reserved is zeroed out at guest creation time. Both of these formats represent a trade off in performance versus security as zeroedthick can result in stale data existing on the VMFS. Hyper-V provides for “thick” allocated VHDs or dynamically expanding VHDs.

As with thin provisioning in traditional environments there are positives and negatives in using the technology in virtual environments. Thin provisioning within the hypervisor means many more virtual machines can be accommodated on disk and this is especially useful where virtual machines are deliberately over allocated in size to cater for future growth. Of course the down side to on-demand expansion is the dispersed nature of the storage for a single virtual guest.

As each guest on a datastore or volume expands, it allocates space in 1MB or 2MB chunks with no predictability on when the next chunk will be requested for any specific virtual machine. This leads to a random and fragmented layout for the storage of an individual guest. This is particularly true of virtual desktop environments, which have a high degree of random I/O, producing performance problems as many virtual desktops are started at the same time.

One obvious question that arises from using thin provisioning is whether thin technologies should be implemented in both the hypervisor and the storage. There is no reason not to have thin provisioning in both places; the only recommendation is to ensure reporting and monitoring is in place to manage growth.