This article will recap the fundamentals of file and block, but with the purpose of highlighting the quite different characteristics of object storage, all of which are forms of shared storage. In the final analysis, we will suggest the use cases most suited to object storage, as well as file and block.
To see how object storage differs significantly from SAN and NAS protocols, let’s first look at those.
File and block are file system-based methods of storage access.
In both cases, there is a file system. We are all familiar with them – FAT and NTFS in Windows, ext in Linux, and so on. They organise data into files and folders in a tree-like hierarchy and give a path to the file while also retaining a small amount of metadata about the file.
That is the part we see. But under the bonnet, that file path and the file system also handle addressing to the physical location of blocks of storage on the media itself.
The key difference between file access/NAS and block access/SAN is that in NAS, the file system resides on the array. Here, an application’s I/O requests go via the file system resident on the NAS hardware, accessed as a volume or drive. In a SAN, the file system is external to the array and I/O calls are handled by the file system on the server, with only block-level information required to access data from the SAN.
Key practical difference
From that distinction arises the key practical difference between NAS and SAN.
NAS is best suited to retention and access of entire files and has locking systems that prevent simultaneous changes and corruption to files.
Meanwhile, SAN systems allow changes to blocks within entire files and so are extremely well suited to database and transactional processing.
Both usually come as array products, even if software-defined, and – depending on how high-end or not – with features such as synchronous and asynchronous replication, snapshots, compression and deduplication, and storage tiering. Both can also take advantage of flash storage.
SAN and NAS are well suited to what they do, but have drawbacks.
For example, NAS can be limited by scale. Historically, organisations put in a NAS box to service a department, but these proliferated and were unconnected, leading to silos of data. This issue is overcome with scale-out NAS, where multiple NAS instances operate a single, highly-scalable parallel file system.
The tree-like file system hierarchy can handle millions of files quite easily, but once you scale to billions, it can start to slow up.
Object storage brings massive scalability. That is because it works differently from the SAN and NAS protocols. It has no file system but, like NAS, changes are at the file level.
Instead of a tree-like hierarchy, object storage organises files, or objects, in a flat layout. Objects are just objects, with unique identifiers.
That means object storage is massively scalable, to billions of objects, because the file organisation does not become unwieldy the bigger it becomes.
Objects also have metadata, and lots of it, potentially, all definable by the customer. That means any attribute can be associated with an object in its header metadata: the application it is associated with, its data protection characteristics, tiering information, when it should be deleted, and by custom business- or organisation-related attributes.
So, object storage is eminently suited to analytics, being searchable in very large datasets for potentially almost any attribute.
Data protection is usually by erasure coding, sometimes by replication, although the former is considered more efficient than the latter because it produces less overhead data.
Almost always, however, object storage data is “eventually consistent”, which means the multiple instances required for data protection schemes to work are not instantaneous or anywhere near. They will eventually be consistent with each other as erasure coding/replication works its way between locations.
Read more on object storage
- Object storage is a rising star in data storage, especially for cloud and web use. But what are the pros and cons of cloud object storage or building in-house?
- Both NAS and object storage offer highly scalable file storage for large volumes of unstructured data, but which is right for your environment?
But that multiple location attribute can also be an advantage, making object storage well-suited to an organisation with multi-regional needs.
By contrast, SAN and NAS can be “strongly consistent”, with near real-time mirrors of datasets possible.
Also, object storage cannot perform as well as SAN and sometimes NAS, mainly because of the large file header overheads it carries. It also cannot offer the sub-file block-level manipulation required for database and transactional work that SAN access can.
For those two key reasons, object storage is best suited to large datasets of unstructured data in which objects do not change that often.
Outside the pros and cons of the technology per se, object storage has the advantage of relative cheapness, often running on commodity hardware. That is in contrast to potentially expensive packaged array-type products from storage box suppliers.
Having said that, costs can come in other areas, such as changes to your software environment. Not all applications will necessarily be natively compatible with object storage file calls. Built for NFS, SCSI, and so on, they will need adapting to deal with the Get, Put, Delete and other commands of object storage.
To sum up:
- NAS: Good at secure file sharing. Can become siloed. Scale-out NAS potentially good at scale. Bad at extreme scale.
- SAN: Good at transactional and database workloads. Can be expensive.
- SAN and NAS: Both can come with advanced storage features, such as replication. Both can be relatively costly compared with object storage on commodity hardware, although both SAN and NAS software-defined storage are available. Both lack the rich metadata of object storage.
- Object storage: Very scalable, suited to unstructured data and large datasets, potentially good for analytics via rich metadata. Lacks high-end performance and data protection is slow across clusters. Can be very cost-efficient, hardware-wise.