Podcast: Why you need clustered NAS and why it is ideal for big data

Cliff Saran and Antony Adshead

Clustered NAS, or scale-out NAS, is an important development in file-based storage. It marries the ease of implementation and management of NAS with massively scalable file system capability. That means NAS devices no longer need to sit in departmental silos and petabytes of file data can be managed from one screen.

In this interview Cliff Saran, managing editor (technology) at ComputerWeekly.com, speaks with Antony Adshead, storage editor at ComputerWeekly.com about clustered NAS, what it is good for and who sells it.

Cliff Saran: So, Antony, the first question I have for you today is, what is clustered NAS?

Antony Adshead: In short, clustered NAS, or scale-out NAS, is NAS that comes in nodes that you can add together to form clusters with a common file system.

That is the short definition; next comes the slightly longer version.

To explain it, we can contrast clustered NAS with what went before, ie traditional NAS – a great innovation that puts a large amount of storage on what is effectively a server on the network that is easy to install and manage.

But there are limits in terms of the scalability of performance and capacity and you risk a situation called NAS sprawl, which is characterised by siloed data in unconnected NAS filers scattered throughout departments and branch offices. You cannot see all the files on them from one place and so management and migration of data between them is a pain and you lack the ability to handle all your information en masse.

Clustered NAS puts a stop to this. You can grow a cluster in grid fashion and it has a massive common file system. Adding NAS devices means adding parts of a whole, with files spread across all devices and resilience built in.

So, if your storage becomes processor or memory-bound, you can add a controller to gain more power without adding more disk. Or, you can add disk that can be seen from all devices and device failures are non-disruptive.

The thing that enables this clustered capability is a distributed or parallel file system that enables all nodes in a cluster to see all the files in the environment.

All of this is important for a number of reasons. To begin with, it means clustered NAS is way more flexible and scalable than traditional NAS. It puts an end to silos of NAS that simply cannot talk to each other and that are scattered around the organisation.

Clustered NAS is also achieving prominence at a time when organisations are focusing on what to do with large amounts of unstructured file data. This is, in part, what all the talk about big data is about – vast amounts of unstructured information, i.e., not in a database, and the challenge of searching and interrogating it to derive business value. Clustered NAS, with its ability to store very large volumes of data that is all visible via one file system, is ideal for this.

Saran: Who makes clustered NAS products and how do they differ?

Adshead: The scale-out NAS market has largely been divided between a number of clustered NAS specialists and the sub-brands of the big suppliers, with the two strands merging somewhat, via acquisition.

There has also been a significant milestone in terms of clustered NAS going mainstream recently, of which more in a minute. Firstly, let’s run through some examples of the main clustered NAS suppliers and brands.

Big supplier clustered NAS ranges include:

EMC’s Isilon family, which – with its OneFS file system-based hardware – can scale up to 15PB. Three different product lines come optimised for either IOPS, sequential throughput or capacity.

IBM, meanwhile, has its Scale-Out NAS or SONAS family, which is built on its GPFS (General Parallel File System) and Lintel servers. It supports billions of files and scales to around 20PB.

HP has put the Ibrix software it acquired in 2009 into HP servers to build the X9000 Network Storage System. This comes either as gateways to which customers can add their own disk or fully integrated devices, and can scale up to 16PB.

HDS acquired BlueArc in September last year. BlueArc’s big difference in the market is that instead of commodity CPUs, it uses dedicated field-programmable gate arrays (FPGAs) customised to scale-out NAS performance. BlueArc’s Titan Series can scale to 16PB with a single namespace.

Of the other big suppliers, I’ll get to NetApp in a minute.

But first among the start-ups is Avere – which sells appliances that accelerate access to existing traditional NAS devices and provides a single namespace for all the data on them. It uses flash storage to accelerate access and has been popular among organisations that have multiple locations and want access to data held remotely on NAS devices.

And there’s Panasas – which provides scale-out NAS products built on a blade architecture; director blades that deliver I/O and storage blades for capacity. A single Panasas namespace can scale to 6PB.

Finally, there’s NetApp – possibly the name most associated with NAS over the years. This year, it upgraded its OnTap operating system to provide true clustering capability; it has always had a kind of halfway-house clustering that was not really clustering.

But, this year supplier’s FAS 2220 was the first of its filers to ship with OnTap 8.1.1, which has true clustering capability – potentially up to 20PB and 20 billion files – and you can download the upgrade for its other filers. At present, however, clustering is limited to five high-availability pairs, but the move is significant in that it shows clustered NAS going mainstream.

And that is a very good thing in my opinion; this kind of scalability should be a given in this type of product.

This was first published in October 2012


COMMENTS powered by Disqus  //  Commenting policy