Depending on which survey you look at, around 60% of all server workloads in the enterprise are now virtualised.
But, virtual machines (VMs) – virtual servers and their close cousin virtual desktop infrastructure (VDI) – create a very different workload to that seen on traditional storage arrays. As a result, we have seen the rise of platforms dedicated to VM storage.
The question is, why are dedicated VM storage appliances needed and what specific issues do they resolve? Are VM storage products destined to remain in a niche or are they how storage will look in the future?
Why we needed shared storage
The shared storage model arose out of the need to meet a number of key issues at the start of the millennium.
- Scalability – Servers deployed with their own storage simply couldn’t scale; as capacity was added to a server, the process involved downtime and potentially physical relocation of the server itself. SCSI-attached servers had to be deployed close to the storage array, resulting in growth restrictions.
- Availability – Although servers supported software and hardware RAID, with each server having their own disk, large organisations had to replace failed disks in servers on a daily basis.
- Management – Distributed storage resources require more planning and management than centralised solutions.
Storage networking – specifically with Fibre Channel – enabled server storage to be consolidated to large storage arrays or appliances. The result was a mixed workload (some batch, some database, with a mix of sequential and random I/O) suited for a general purpose storage array.
Virtualisation workloads impact storage
Virtual server environments have gained in popularity, from first being used in test/development environments to going mainstream for most production workloads.
Virtual server workloads typically have the following characteristics:
- High levels of random I/O – In the world of physical servers, it was usual that one physical box was dedicated to one application. In the virtual world that’s not the case and when many VMs read and write to disk at the same time, the result is a high level of random I/O activity. Traditional storage arrays are not great at managing random workloads; hard drives cope much better with sequential data.
- High peak demands – VDI especially can generate high read requests (when desktops are started in the morning; a “boot storm”) or high-write requests (when desktops are shut down at the end of the day). The same can be said for virtual server environments at times of high reboot activity or data re-organisation.
- Platform-specific operations – Traditional storage environments used array-based replication and snapshots for data protection and disaster recovery. Virtualisation has moved those features away from the array and into the hypervisor layer. This results in an unpredictable I/O load as virtual machines are cloned and moved around the infrastructure. Features such as VAAI (VMware vSphere APIs for Array Integration) and Offloaded Data Transfer (ODX) in Microsoft Hyper-V have helped mitigate some performance issues.
As virtual environments have matured and taken on more production workloads, there has been a need to be able to prioritise I/O for servers. The traditional approach was to move VMs between storage tiers or LUNs, but this is not a practical approach and can impact performance.
The rise of VM storage
To meet the challenges of delivering VM storage we have seen the emergence of two types of product; the hybrid storage/server appliance and VM-aware storage.
The hybrid solution combines storage and compute resources into a single appliance, which is deployed as a number of nodes rather than a single server. VMs and their associated storage are distributed across the cluster of nodes, which can be expanded by adding additional servers to the configuration.
The most obvious benefit of this solution is the lack of a dedicated storage infrastructure to support. This can mean significant Capex (capital expenditure) and Opex (Operational expenditure) savings in staff and hardware.
A cluster of multiple nodes distributes the storage and computing resources in a redundant manner. This provides two main benefits. First, if a node fails the others are able to recover the workload and second, I/O is distributed across all cluster members to make full use of the IOPS available on all disk spindles. This is especially useful for boot storms or other periods of high I/O activity.
VM-aware products deal directly with the files that comprise VMs and manage their requirements. In vSphere NFS deployments, virtual machines are made up of VMDKs (Virtual Machine Disks) and associated files that manage the configuration. Hyper-V uses the VHD format that encapsulates the entire VM in a single file. By understanding the contents of the VM, virtualisation-aware storage can implement prioritised and optimised I/O as well as advanced features such cloning and replication either under the direction of, or without the hypervisor.
Some VM storage products
Nutanix is probably the most well-known of the hybrid storage/compute products. Its newest cluster hardware platform, the NX-3000 uses the latest Intel Sandy Bridge processors, with 128GB or 256GB of RAM per node. Storage is provided by 400GB of PCIe SSD, 300GB of SATA SSD and 5TB of SATA HDD. A “starter kit” of four nodes (combined together to create a “block” in a 2U 19” form-factor chassis) is capable of supporting around 400 virtual servers or 900 to 1,200 virtual desktops. A cluster uses the Nutanix Distributed File System (NDFS) to store data across all nodes, which are connected using Gigabit Ethernet. This manages data striping and replication as well as more advanced functions such as auto-tiering between solid state and SATA storage.
Scale Computing is a supplier that offers the hybrid server/storage solution. Its HC3 platform allows up to eight nodes to be combined into a single cluster. Each node comprises one Quad Core Intel processor; 32GB of RAM and four SATA or SAS drives for a maximum storage capacity of 8TB. A fully configured eight-node cluster can support up to 100 virtual machines. Scale also offers a dedicated storage node that can be used to provide additional storage capacity. The S-Series storage node adds up to 4TB of usable SATA drive capacity (with 2GB of cache), while the M-Series offers higher performance, with up to 1.2TB of SAS drives (and 32GB cache) or 4TB of SATA (and 32GB of cache).
Simplicity is another hybrid server/storage vendor that has recently released products to the marketplace. Their Omnicube hardware platform can be combined into a “Federation” with two or more nodes. Each node comprises of 800GB SSD and 24TB of HDD storage, two 6-core Intel Xeon processors and up to 768GB of memory. The solution implements compression and data deduplication to improve storage utilisation with a usable capacity of 20GB to 40GB.
Tintri offers a VM storage appliance tailored for VMware’s vSphere platform. The VMstore T445 is a single 4U controller with 8.5TB of raw storage capacity; the VMstore T540 is a 3U dual controller configuration with up to 13.5TB of usable capacity, with the storage mapped to the hypervisor as an NFS share. Both platforms support VMware vSphere 4.x and 5.x as well as VMware View, Citrix XenDesktop and Citrix XenApp for VDI. The Tintri system is able to provide highly granular reporting at a per-VM or per-vDisk level (a vDisk encapsulates the files for a single VM) and recent functionality has added virtual machine snapshots and remote replication.