Server virtualisation and data growth changing storage-area network, network-attached storage

Will the demands of server virtualisation and data growth lead to a breakdown of existing storage categories such as SAN and NAS?

The data storage industry's old, seemingly fixed definitions of monolithic versus modular array and storage-area network (SAN) versus network-attached storage (NAS) are cracking just like the storybook character Humpty Dumpty. The triple onslaught of commodity hardware, server virtualisation and the demands of ballooning data growth are proving too much for the old standards. So, what is emerging from the wreckage?

The need for scalability is creating difficulties for existing modular arrays. A single box can no longer cope when many applications need more capacity than it can provide. Aggregating boxes together means making them function as a single system. That means cache coherency across boxes and, in the case of NAS, a global file system.

we are looking at a future where the physical boundaries between FC SANs, iSCSI SANs, NAS filers, clustered HPC filers and CAS boxes are being replaced by logical boundaries.
The scalability issue is not just about increasing amounts of data. With server virtualisation, the number of I/O requests from a single physical server is growing quickly, and the aggregation of physical servers into blade server farms means attached network storage has to deal with a huge number of I/O requests.

Virtual machines (VMs) can be provisioned and de-provisioned easily, with the hypervisor instructed to set them up and break them down on demand. Inevitably, some storage management functions are moving to the server, which will start controlling application access to data and possibly the security, protection and retention of that data, either directly or by policy.

How virtualisation will affect storage arrays

On the array side, there has to be a means of adding storage capacity smoothly and quickly. It seems obvious that future arrays will present virtualised pools of storage which they carve into virtual logical unit numbers (LUNs) for block access or share for file storage space. In such a scenario, if more capacity is needed, then another array box (a rack of disk enclosures) needs to be wheeled into the data centre, powered up, connected, detected and resources plugged into it so it can be assimilated transparently. When more storage I/O is needed, cores can be added to controllers to separately scale performance and capacity.

What we're looking at here is some form of decoupling array controllers and drive enclosures. 3PAR already has the notion of federating its eight-way InServ clusters to preserve the advantages of clustering without putting all of your data storage eggs in one cluster basket. Xiotech is thinking about treating its Emprise arrays as aggregable storage elements under the management of some virtualising controller. EMC's Symmetrix V-Max also incorporates scale-out elements. Hewlett-Packard (HP) is looking at a three-way divide between drive enclosures, controllers and higher-level storage applications in servers.

Dispositions are changing within the array disk drive, with very high-speed data access function moving to solid-state drives (SSDs) and the basic store-as-much-data-as-you-can function going to 1 TB and 2 TB SATA drives, and 3 TB, 4 TB and 8 TB models in suppliers' roadmaps. Spin-down on these drives looks to be a familiar option for low-access data storage. Currently, expensive Fibre Channel (FC) drives look to be replaced by cheaper SAS drives, with high IOPS numbers being provided by trays of 2.5-inch drives.

Array controllers are being given automated data movement capabilities to move fast-access data onto SSDs, medium-access data onto fast disk drives and low-access rate data onto SATA drives. Compellent does this by tracking block-level activity. EMC is going to add its FAST utility to move data at LUN level first and then sub-LUN level. Pillar Data Systems boss Mike Workman said we're going to need the ability to move data on command by systems administrators according to policies referring to data type or other metadata characteristics, and by blind block-level or sub-LUN-level activity tracking. One mode will not be enough.

As array controllers undergo a transformation from proprietary hardware to being, in essence, embedded X86 servers, then the same basic hardware can run different array controller software and turn the disk drives they look after into an FC SAN, iSCSI SAN or filer. It's the same disk drives and the same controller hardware, but with different storage personalities layered on top. That's the way HP seems to be heading with its LeftHand Networks and EVA products.

EMC could conceivably go further with its product set by adding in its object-based Centera, which stores unstructured file data as objects in its content-addressed storage (CAS). The files are given a unique hash address based on their content and stored outside of a standard file system but within a type of directory structure.

A future of logical boundaries

In short, we are looking at a future where the physical boundaries between FC SANs, iSCSI SANs, NAS filers, clustered high-performance computing (HPC) filers and CAS boxes are being replaced by logical boundaries. IBM seems yet to adopt this idea, by the way.

How might all this look? There could be a set of controller hardware entities -- X86 server blades or boxes -- which link to VMs in servers and to back-end solid-state drive and commodity drive-based storage enclosures aggregated into virtualised pools of capacity for individual controllers or sets of controllers with different personalities (SAN, file, object, archive, spin-down etc.)

We can then add other functions on top of this, such as a repository for disk backup data or a data archive. Policies can be applied to data for replication, single file instancing and data deduplication.

The growth of concepts such as cloud computing magnify the argument for Internet-scale data centres. In such thinking, where consideration of operation, acquisition and power costs dominates, enterprises will prize the ability to use commodity hardware-based controllers and their underlying drive enclosures, as well as to easily grow them by scaling up in a cluster and scaling out through federation.

It's likely that small- and medium-sized enterprises (SMEs) will stay with tried-and-trusted modular arrays for a good while yet though. Their storage Humpty Dumpty is safe enough on the wall for a few years yet, but his enterprise big brother is wobbling and ready to fall off his perch.

BIO: Chris Mellor is storage editor at The Register.

Read more on SAN, NAS, solid state, RAID