vSphere and storage: The ultimate guide

VMware's vSphere offers myriad storage functions ... and more than a few complications for storage administrators. In this story, we analyse vSphere's impact on storage and advise how to get the most from the software.

When VMware released vSphere in May 2009 it included more than 100 new features and enhancements, many of which addressed data storage. These data storage management enhancements include:

  • Thin provisioning enhancements
  • iSCSI improvements
  • Support for Fibre Channel over Ethernet (FCoE) and jumbo frames
  • New ability to hot extend virtual disks
  • New ability to grow VMFS volumes
  • New Pluggable Storage Architecture (PSA)
  • New paravirtualized SCSI adapters
  • New VMDirectPath for storage I/O devices
  • VMware Storage VMotion enhancements
  • New vStorage APIs
  • New storage views and alarms in vCenter Server

There are many vSphere storage enhancements and some may have a profound effect on your environment, so we'll look at each one in detail.

Thin provisioning enhancements

Thin provisioned disks are virtual disks that start small and grow as data is written to them. With a "thick" disk, all of its space is allocated when it's created; but a thin disk starts out at 1 MB (or up to 8 MB depending on the default block size) and then grows its defined maximum size as data is written to it by the guest OS.

Thin provisioning was available in Virtual Infrastructure 3 (VI3), but a number of changes make it more usable in vSphere:

  • Thin disks can be created using the vSphere client at the time a virtual machine (VM) is created; with VI3, the vmkfstools command line utility was used to create them.
  • Existing thick disks can be converted to thin disks using Storage VMotion while a VM is running; VI3 required powering off the virtual machine.
  • The vSphere client lets you see the actual size of thin disks (previously a command line function in VI3).
  • New configurable alarms in vCenter Server provide alerts for overallocation and usage percentages.
  • A new safety feature automatically suspends VMs with thin disks when free space is critically low.

These improvements make thin disks more manageable and much easier to use. Users often ask if they should use VMware's thin disk if their storage array supports thin provisioning ("thin-on-thin"). Use both if you can, but make sure you carefully monitor the array and VMware to ensure that you have adequate space available. Another concern with thin disks is the impact on performance as they grow and the increased fragmentation that may occur. According to VMware, thin disks have a negligible effect on performance.


iSCSI improvements

iSCSI storage arrays have become a popular storage choice for virtual hosts due to their lower cost and acceptable performance. Using iSCSI software initiators has always resulted in a slight performance penalty vs. hardware initiators with TCP offload engines. For vSphere, VMware rewrote the entire iSCSI software initiator stack to use CPU cycles more efficiently and to improve throughput compared to VI3.

VMware enhanced the VMkernel TCP/IP stack, optimized the cache affinity and improved internal lock efficiency. Other iSCSI improvements include easier provisioning and configuration, and support for the bi-directional Challenge Handshake Authentication Protocol (CHAP), which improves security by requiring both the initiator and target to authenticate each other.

Support for FCoE and jumbo frames

vSphere added support for newer storage and networking technologies, including:

  • Fibre Channel over Ethernet. vSphere now supports FCoE on converged network adapters (CNAs).
  • Jumbo frames. Conventional Ethernet frames are 1,518 bytes long; jumbo frames are typically 9,000 bytes, which can improve network throughput and CPU efficiency.

VMware added jumbo frame support in ESX 3.5 but didn't officially support it for use with data storage protocols. With the vSphere release, they officially support using jumbo frames with software iSCSI and NFS storage devices using 1 Gbps or 10 Gbps network interface cards (NICs).

Hot extend virtual disks

With VMware vSphere you can increase the size of an existing virtual disk (VMDK file) while it's powered on as long as the guest operating system supports it.

Once you increase the size of a virtual disk, the guest OS can then begin using it to create new disk partitions or to extend existing ones. Windows Server 2008, Windows Server 2003 Enterprise and Datacenter editions, and certain Linux distributions support this feature. Previously, you had to power down a VM before increasing its virtual disk size.

Grow VMFS volumes

With vSphere you can increase the size of VMFS volumes without using extents and without disrupting virtual machines. To do this in VI3, you had to join a separate logical unit number (LUN) to the VMFS volume as an extent, which had some disadvantages. vSphere lets you grow the LUN of an existing VMFS volume using your storage-area network (SAN) configuration tools and then expand the VMFS volume so it uses the additional space.

This means you don't have to use extents and can avoid moving VMs to other data stores to destroy existing VMFS volumes to create larger ones.

Pluggable Storage Architecture

VMware has given vSphere a new modular storage architecture that allows third-party vendors to interface with certain storage functionality. The Pluggable Storage Architecture (PSA) allows vendors to create plug-ins for controlling storage I/O functions like multipathing.

There's built-in functionality that allows for fixed or round-robin path selection when multiple paths to a storage device are available. Vendors can expand on this and develop their own plug-in modules for optimal performance through load balancing and more intelligent path selection. To achieve this, the PSA leverages the new capabilities provided by the vStorage APIs for multipathing.


Paravirtualized SCSI adapters

Paravirtualization is a technology available for certain operating systems that use a special driver to communicate directly with the hypervisor. Without paravirtualization the guest OS doesn't know about the virtualization layer and privileged calls are trapped by the hypervisor using binary translation.

Paravirtualization allows for greater throughput and lower CPU utilization for virtual machines, and is useful for disk I/O-intensive applications. Paravirtualized SCSI adapters are separate storage adapters that can be used for non-primary OS partitions and can be enabled by editing a VM's settings and enabling the paravirtualization feature.

This may sound similar to VMDirectPath, but the key difference is that paravirtualized SCSI adapters can be shared by multiple VMs on host servers and don't require dedicating a single adapter to a single virtual machine.

VMDirectPath for storage I/O devices

VMDirectPath is similar to paravirtualized SCSI adapters where a VM can directly access host adapters and bypass the virtualization layer for better throughput and reduced CPU utilization. But with VMDirectPath you must dedicate an adapter to a VM and it can't be used by any other virtual machines on that host.

VMDirectPath is available for specific models of both network and storage adapters, however, only the network adapters are currently fully supported in vSphere; storage adapters only have experimental support. Like PVSCSI adapters, VMDirectPath can be used for VMs that have very high storage or network I/O requirements like database servers. VMDirectPath enables virtualization of workloads that you might previously have kept physical. A downside to using VMDirectPath is that you can't use features like VMware VMotion and Distributed Resource Scheduler (DRS).

Storage VMotion enhancements

While VMotion moves a running VM from one host to another leaving the virtual machine location intact, Storage VMotion (SVMotion) keeps the VM on the same host and only changes the VM's storage location. SVMotion was first introduced in ESX Version 3.5, but was only available as a command line utility. In vSphere, it's integrated in the vSphere Client, allowing quick and easy SVMotion moves. In addition, SVMotion now allows thick-to-thin disk conversion (and vice versa).

SVMotion can also be used to re-shrink a thin disk after data has been deleted from it. Typically, you use Storage VMotion to move a VM location to another storage device; however, you can also leave the VM on its current storage device when performing a disk conversion. SVMotion can be invaluable when performing data storage maintenance as running virtual machines can be easily moved to other storage devices.

Some under-the-covers enhancements make the whole migration process much more efficient. Instead of using a snapshot when copying the disk to its new location and then committing it when the operation is complete, SVMotion now uses a new changed block tracking feature to keep track of blocks that changed after the move process started and then copies them after it completes.

Changed block tracking

Changed block tracking (CBT) is a significant new data storage feature that's especially important for backup, replication and other data protection applications. vSphere's VMkernel can now track which disk blocks of a virtual machine (VM) have changed from a particular time. By tapping the VMware vStorage APIs for data protection, applications can get the information from VMkernel rather than figuring it out on their own. CBT also enables near-real-time continuous data protection (CDP) when replicating VM disk files. CBT can also speed up incremental backups because backup apps can easily find out which changed blocks need to be backed up. Restoring data is much easier, too, with CBT because backup apps will know exactly what blocks need to be put back on the virtual disk for the restore point selected.

CBT is disabled by default because there's a slight performance overhead associated with it. It can be enabled only on the VMs that require it by adding a configuration parameter to the VM; backup applications that support changed block tracking can also enable it on VMs. Once enabled, CBT stores information about changed blocks in a special --ctk.vmdk file that's created in each VM's home directory. To do this, CBT uses changeIDs that are unique identifiers for the state of a virtual disk at a particular point in time. New changeIDs are created anytime a snapshot of a VM is created by a backup application. Using the changeID, a backup application will know which blocks have changed since the last backup. CBT is only supported on VMs that have virtual machine hardware Version 7 (which is new to vSphere), so older VMs will need to have their virtual hardware upgraded to use CBT.

New vStorage APIs

vStorage APIs are a collection of interfaces that third-party vendors can use to seamlessly interact with data storage in vSphere. There are four categories of vStorage APIs:

Array integration. These APIs are being co-developed with specific storage vendors. When complete, they will allow access to array-oriented capabilities such as array-based snapshots, hardware-offloaded storage device locking, integration between VMware and array-level thin provisioning, storage provisioning, data replication and more. In some cases, greater efficiencies may be realized by allowing the storage array to perform certain operations. VM cloning or template-based deployment can be hardware accelerated by array offloads rather than file-level copy operations at the ESX server. Storage VMotion would be able to leverage the storage array features to copy data more rapidly and with less ESX host impact; and rather than the traditional "SCSI lock" mechanism used by VMware, the array can lock only specific blocks being updated, which dramatically increases the number of VMs that can be deployed on a data store.

Multipathing. These APIs are used by the Pluggable Storage Architecture to allow storage vendors to more intelligently use multipathing for better storage I/O throughput and storage path failover. Storage vendors must certify their multipathing extensions modules with VMware for use with ESX(i). There are several vStorage multipathing APIs: A path selection plug-in (PSP) can extend the path selection algorithms for any given I/O; a storage array-type plug-in (SATP) allows new/changed path discovery and ongoing path state management; and a multipathing plug-in (MPP) can extend the entire path management model of vSphere, including path management and path selection.

Site Recovery Manager (SRM). These APIs are used to integrate SRM with array-based replication for block and NAS storage to allow SRM to seamlessly handle both VM and host failover and storage replication failover. They also allow SRM to control the underlying array-based replication that it relies on.

Data protection. These APIs replace VMware Consolidated Backup (VCB) that was introduced in VI3. While they include VCB functionality, they also add new features such as changed block tracking and the ability to directly interact with the contents of virtual disks via the VDDK. These APIs are for backup and data protection application vendors to provide better integration.


Click here to view the full-size PDF of vSphere vs. Hyper-V and XenServer.




vCenter Server's new storage views and alarms

VMware also improved storage-related reporting and alarms in vCenter Server. The most conspicuous is a new storage view that shows detailed information on storage metrics. Alarms have been expanded to include specific storage-related issues like datastore overcommitment and low disk space.

vCenter Server's storage view is a plug-in that must be installed and enabled. Once enabled, an additional Storage View tab will appear in the right pane after selecting any object in the left pane. The storage view has selectable columns that display information such as the total amount of disk space a VM is using (including snapshots, swap files, etc.), total amount of capacity used by snapshots, total amount of space used by virtual disks and other capacity usage statistics.

This is a great tool to quickly see how much space is being used in your environment for each component and to easily monitor snapshot space usage. There's also a map view to see relationships between virtual machines, hosts and storage components.

In VI3, the only storage alarm was for host or VM disk usage (in KBps). Hundreds of new alarms have been added, with many of them related to storage, such as an alert for a datastore that's close to running out of free space. This is especially important when you have a double threat from both snapshots and thin disks that can grow and use up all the free space. Other storage-related alarms include:

  • Datastore disk overallocation percent
  • Datastore state to all hosts
  • Datastore created/increased/deleted/discovered/expanded
  • Degraded storage path redundancy
  • Lost storage connectivity

vSphere does storage better

In addition to vSphere's major enhancements for storage operations, there are many smaller improvements not covered here. Taken together, these enhancements provide better performance, improved usability and easier administration. And they may be compelling enough to convince current VMware users to upgrade to vSphere.

Read more on Virtualisation and storage