Maksim Samasiuk - Fotolia

NAS vs object: Which one for large volumes of unstructured data?

Both NAS and object storage offer highly scalable file storage for large volumes of unstructured data, but which is right for your environment?

Object storage is a fashionable topic, boosted by its massive scale-out capability and its related ability to handle very large amounts of unstructured data – object technology now underpins much cloud storage, for example.

However, file-based network-attached storage (NAS) remains widely used and sees continued development with the advent of clustered NAS, and it too is targeted at use cases that involve large amounts of unstructured data. 

So how do you differentiate and choose between the two? Will everything trend towards object storage, or are there application areas where NAS will remain supreme? Or is this a false dichotomy, with object storage and NAS merely being two views on the same thing?

We are seeing the two overlap more and more. Many object storage systems also offer file (and block) interfaces, while high-end NAS employs many of the same infrastructure elements that make object storage possible, most notably scale-out technology. We even have systems, such as NetApp’s latest iteration of StorageGrid, that allow you to write data as a file and read it back as an object.

Indeed, there are strong grounds to argue that object storage is merely file storage done right. After all, the original NAS file systems were something of a bodge and still have issues, despite being upgraded and updated over the years.

For example, even though we have migrated from the 8.3 filenames format imposed by MS-DOS to the flexible formats allowed today, we can still fool computers into running malware by giving a file a different extension.

Some in the business have even suggested the proponents of object storage did it a disservice by giving it a new name. Had they instead called it an enhanced file system, it would have looked a lot less scary and unfamiliar to many potential users. Of course, it might also have looked a lot less innovative and intriguing to others.

NAS vs object: Balancing the scales

The first scale-related aspect to consider is that the larger, older and more unstructured your data store is, the more likely it is to be suited to object storage. Conversely, NAS may be a simpler and better-performing option for fast-changing data or small stores.

Object storage enables enterprises and service providers to manage multi-petabyte secondary storage with relative ease. This does not directly compete with traditional file and block storage for serving frequently-accessed data and transactional workloads.

In addition, when we refer to storage performance we usually think in terms of speed, latency and throughput in the datacentre. This is very different to the cloudy world of distributed applications and clients, where mobile devices typically access data over long distances and from widely disparate locations.

The second differentiator is geographic scale. In the distributed world we need distributed storage performance and throughput. This is something that distributed object storage architectures can supply effectively, thanks to a combination of fast and reliable object streaming, load balancing and various caching mechanisms that enable support for a multitude of concurrent clients simultaneously. Add Rest-based protocols such as Amazon S3, and it makes object particularly efficient as storage for remote devices.

Meanwhile, however, there is no doubt that scale-out NAS deployment to very large volumes of data is thoroughly achievable. Indeed, in many cases it is now the primary option for huge volumes of file data in a highly-scalable clustered file system.

Scale-out NAS offers significant advantages over traditional, or scale-up, NAS. Traditional NAS is based on discrete file system instances, and is limited in terms of hardware scalability. Meanwhile, scale-out NAS allows expansion of its parallel file system across clusters of hardware nodes, with the ability to grow capacity and performance independently, often to petabyte scale.

An object lesson in fault-tolerance

Scale-out capability, therefore, keeps NAS competitive for larger data volumes. Of course the scale-out metaphor is also the norm for object storage, although object platforms such as Ceph and Scality operate in somewhat different ways from scale-out NAS.

Where NAS uses Raid to stripe and mirror data for data protection, they instead distribute and replicate objects (file data plus associated metadata) across storage nodes available to them, using fault-tolerant technologies known as forward error correction (FEC) or erasure coding.

An issue for NAS and Raid is that as disk drives grow in capacity to meet the ongoing data explosion, the system’s ability to survive loss of drives becomes ever more tenuous. In the days when rebuilding a drive meant reassembling a few gigabytes of data, the time required was tolerable.

But with drives now in the terabytes, a rebuild can mean pulling several hundred times more data over an interface only 10 or 20 times faster than it was in the days of LVD parallel SCSI. As rebuild times grow, so does the risk of a second drive failure, and protecting against that also greatly increases the cost and complexity involved.

In contrast, object storage is generally less efficient in its use of physical storage capacity and typically stores each object three times for resilience. It can use distributed nodes, however, and distributes the data (which can improve performance) and the risk. Data can also reside on commodity storage, which brings down the overall cost.

Data in the cloud

So, there are considerable attractions to an object infrastructure, even if you then use it to provide a file system interface – as indeed many cloud storage providers do.

Having said that, NAS is tried and tested. For smaller sites and data volumes, scale-up NAS will remain effective and simple to implement. Similarly, where you need outright performance and low latency in the datacentre, and of course for compatibility with today’s applications which expect CIFS or NFS, scale-out NAS is likely to remain king. NAS is also a good option where you have frequently changing data, because object storage is built with relatively static data in mind.

But while scale-out NAS can provide high performance, it is limited to perhaps a few petabytes and it comes at a cost. In particular there is networking complexity and expense, with some already implementing 40Gbit Ethernet or InfiniBand for storage traffic.

Once you hit scale, whether in terms of capacity, geographic coverage or both, object storage can provide many of the benefits more simply. It can also be more resilient – self-healing erasure coding is faster and more efficient than legacy technologies such as Raid – and will be more useful if you plan private cloud-type applications.

So, for many users a shift to cloud-oriented object storage, perhaps with a file-oriented overlay, will pay dividends for much of their unstructured data.

Read more about storage and scale-out NAS

This was last published in January 2016

CW+

Features

Enjoy the benefits of CW+ membership, learn more and join.

Read more on SAN, NAS, solid state, RAID

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Well, Mr. Betts has provided a well-written article on NAS and object-based storage. Two areas that need new thinking are his assertion that "today's applications" expect to use CIFS or NFS, and NAS provides a better solution for smaller scale, unstructured data storage. Applications that expect to use CIFS or NFS are legacy applications, and yes you can support them using a gateway to an object storage cluster. Today's applications are written to use a RESful API, like AWS S3, which can directly access an object storage cluster. It is a fallacy to think that object-based storage can only scale-out. Object-based storage can also scale-down, which means you can start with tens or hundreds of terabytes of storage, if that is all the data you need to protect. Not every object-based storage software can scale-down. Cloudian HyperStore can do it and it provides a simple, easy-to-use and manage interface. A single storage administrator can manage up to 10 petabytes of object-based storage. Finally, the future of storage is flash and object. All "hot" and transactional data will reside on flash arrays and all "warm" and "cold" and "archive" data will reside on object-based storage clusters.
Cancel
Although object storage has actually been around for some time, it's rapidly maturing into an enterprise-ready product category. The main thing that makes object a much more attractive alternative is the built in support for NFS/SMB (CIFS) that many object vendors are now including with their products--so users don't have to worry about finding a way--or one of the limited number of apps--to tap into the RESTful interface. With NFS/SMB support, object looks just like NAS>
Cancel
Bryan, thanks for making some excellent points on Object and NAS in your article that I found instructive. You mentioned the examples of Ceph and Scality - hence allow me to encourage you to take a look at Quobyte's data center file system's approach to object store-like scale-out using an object store interface while keeping all the advantages of full blown high performance enterprise file system. Thanks again.
Regards,
Kim
Cancel

-ADS BY GOOGLE

SearchCIO

SearchSecurity

SearchNetworking

SearchDataCenter

SearchDataManagement

Close