Virtualising computing resources has achieved mainstream status, with most organisations implementing server virtualisation via products such as VMware vSphere and Hyper-V within their IT environments. That technology’s storage cousin, block storage virtualisation, doesn’t get as much press, but it’s mature and is in use by a significant minority of IT departments, with about 18% of 302 respondents to the 2011 SearchStorage.co.UK European storage Purchasing Intentions survey reporting that they’ve virtualized at least some of their block storage. But what exactly is storage virtualisation, how does it work and how can you best implement it in your enterprise?
In the storage context, virtualisation means presenting virtual storage devices (LUNs and volumes) to a host in a way that obscures the view of the underlying storage resources and how they are configured.
It’s not the same thing as an internal cloud. In fact, storage virtualisation for block devices has been around for more than 20 years, since the introduction by EMC of the Integrated Cache Disk Array (ICDA) under the Symmetrix brand name. This storage array abstracted the underlying physical disk by presenting logical LUNs to host servers. However, today the definition of storage virtualisation has expanded to encompass virtualisation of resources on three levels:
- Host. Virtualisation at this layer includes the use of logical volume manager software, which takes physical storage and presents it as logical LUNs.
- Network. This layer refers to devices that sit within the network and perform virtualisation out-of-band. This means they redirect data from the host to a specific LUN without storing or caching the contents.
- Array or appliance. This layer is probably the most commonly implemented and includes storage arrays and dedicated appliances that present virtualised storage. The underlying resources may come from internal disk, externally connected disks or arrays, or both.
We will look at each of these block storage virtualisation scenarios in more detail.
Host-based storage virtualisation
Host-based solutions virtualise either the physical storage within a server or storage presented to the server through the use of logical volume manager (LVM) software. The LVM operates as an abstraction layer between the physical disk and logical LUNs presented to the host. Most LVMs allow the physical disks to be split into partitions that can be recombined into logical LUNs to improve performance or resiliency. Host-based storage virtualisation is a good solution where SAN storage is not already in use as it enables features such as wide striping, software RAID, snapshots and replication to be implemented. As we move toward more server virtualisation, LVM software is no longer required.
Network-based storage virtualisation
Network-based solutions operate within the fabric layer of a SAN and are usually implemented out-of-band or using a split-path architecture (SPA). This means they don’t cache a copy of the data from the host but merely redirect the I/O request onto the physical device to which the logical LUN is mapped. The virtualisation device intercepts all command I/O requests (such as SCSI queries) and responds to those directly. Redirection of I/O rather than the in-band method of “store and forward” has negligible impact on performance. Examples of network-based storage virtualisation products include EMC’s InVista and iNSP from Incipient, whose technology was acquired in 2009 by Texas Memory Systems. Network-based storage virtualisation has not been as successful as the most common solution, array/appliance-based virtualisation.
Array/appliance-based storage virtualisation
Array/appliance-based storage virtualisation solutions connect to external disk resources (for example, another storage array) and present this storage as if it were part of the virtualising “engine” itself. The two hardware examples of this technology in the market today are USP/VSP from Hitachi (also sold as the P9500 from HP) and the SAN Volume Controller (SVC) from IBM (SVC technology has also been integrated into IBM’s Storwize products). There are also software-only solutions providing similar functionality, including DataCore’s SAN Symphony-V.
The array/appliance solutions operate in-band, using the store-and-forward method. This means I/O requests are cached by the virtualisation engine with the option to acknowledge the I/O to the originating host before writing (or destaging) that I/O to physical disk. Array-based storage virtualization solutions have a number of benefits:
- Extended functionality. They extend the functionality of one storage array to other arrays. This allows features such as replication, snapshots and thin provisioning to be implemented on the external storage, which may not already support or be licensed for those features. One good example of where this is used today is within the VSP (Virtual Storage Platform) array from Hitachi. The VSP supports vStorage APIs for Array Integration (VAAI), a set of VMware APIs for improving storage I/O performance with vSphere.
- Extending legacy assets. Storage virtualisation can be used to extend the life of legacy or older storage arrays. This is achieved through simplification of the support matrix for host and fabric connectivity. Legacy storage only connects to the virtualising array, has no direct connection to the host and therefore doesn't need to be considered when checking for host compatibility. Because of this, hosts can be at a higher level of support (including drivers) than the legacy array.
- Migration support. Storage virtualisation can be used for data migration. This is achieved by virtualising an existing storage array, then using the virtualisation engine to perform migrations onto either internal storage or another connected array. The benefit of using this method for migrations is in the level of disruption to service. Once an initial outage is taken to put the virtualisation engine in place, data can be moved around transparently and without any further outage. Clearly, there is an issue if the virtualisation engine itself needs to be replaced, but SVC (and, in the future, USP/VSP) can perform an in-place virtualisation engine replacement.
- Cost reduction. Where cheaper storage is virtualised, the overall cost of a storage solution can be lowered. Savings can be found in both the cost of hardware and in licensing, as the virtualised arrays will need minimal software enabled. Storage virtualisation effectively offers the ability to tier data at the LUN level, potentially offering more flexibility than traditional solutions. Of course, it may not always be possible to achieve cost savings if a licence charge is made for every gigabyte of storage virtualised.
There are some drawbacks that should be considered when implementing array-based virtualisation:
- Performance. Using lower-cost arrays in place of traditional storage may result in performance issues, including cache overruns. It is essential to ensure any virtualised solution is capable of managing the expected workload.
- Data integrity. Array-based solutions cache host I/O. In the case of a hardware failure (which could occur in either the virtualisation engine or the external storage), recovery could be made more complex.
- Complexity. Virtualised solutions can become very complex and so should be implemented with standards in place.
In summary, block storage virtualisation is a well-established technology. It can be used from the host to the array and provides benefits in performance, availability, cost reduction and the migration process. We are seeing a trend away from host- and network-based solutions toward using array-based virtualisation as the primary solution. As server virtualisation continues to grow, we will see further integration between the platforms and more host-based virtualisation features integrated into hypervisors.