Object storage has attracted a lot of attention in recent years. Indeed, it has become the foundation data storage layer for much of the cloud and for many of the very popular services that have evolved there, from Facebook to Dropbox and pretty much everything in between.
However, it has also introduced – unnecessarily and dangerously, some would argue – an extra layer of complexity into storage planning. Its formal definition – that it is a storage architecture which manages data as objects, rather than managing it as either files or blocks – has left many people confused and wary.
In many ways, it could have been more productive and less confusing all round had the IT industry called the new concept an “enhanced file system”, or something along those lines. After all, what is an object but a song, a photo or perhaps a database? And weren't those merely files before, so why all the extra complexity?
The answer lies in the object metadata – the additional information that tells you useful things about the data. So while there is still in most cases a one-to-one mapping between objects and files, with each object also being a file, that isn't always true. More importantly, an object is not just a file, and it can be much much more than that.
It is this extra metadata that allows us to apply rules from the physical world to data in the digital world. For instance, to know how to store and move data appropriately – to replicate it, say, or to archive it to cheaper storage tiers – we need to store it by content and context, rather than simply by file name. The object store adds the metadata tags that makes that easier to do, and it adds a unique identifier to each object, eliminating duplication and allowing much greater scalability.
This scalability is one of object storage's key attractions, along with the way that object storage abstracts the file and block layers underneath, removing much of the storage administration overhead. Another is the way it stores data – which increasingly means unstructured data – in flexibly-sized containers, not as blocks of a fixed size, and stores it separately from the metadata, allowing the latter to be searched faster.
Object storage's abstraction also makes it easier to geographically distribute or replicate data, as well as providing better data protection by replacing Raid with a more advanced scheme called forward error correction or erasure coding.
There are challenges, of course, not least in that many applications were designed only to address block- or file-based storage. Several companies have developed storage gateways to bridge this gap. These present file or block interfaces to local applications, but store their data in the cloud.
Amazon and Microsoft, for example, offer LAN-based gateways that provide a local iSCSI interface to a remote object store. However, newer applications – and especially those built to run at hyperscale – are typically designed to use object storage from the outset.
Build or buy?
The popularity of object storage with cloud services means that one of the easiest ways to implement an object store is to do it in the cloud. All the major cloud storage providers offer object storage application programming interfaces (APIs), the best-known perhaps being Amazon Web Services S3 but there are many others, all subtly different.
However, it is just as feasible to build your own object store, either in-house or in a colocation facility, and there is a wide range of products and services available to help that provide a range of object storage APIs.
Read more about cloud storage
- OpenStack and its Swift and Cinder object and block storage lead the way, but are not the only options for building open-source cloud platforms.
- OpenStack is a rising star in private cloud infrastructures. But what about OpenStack storage? We run the rule over OpenStack Cinder and Swift.
Some major storage suppliers now offer appliances that provide object-based access alongside block and file, but most of the options available are software-based storage designed to run atop commodity storage hardware. These include notable open-source software projects such as Ceph, Lustre and OpenStack Swift.
There are arguments for and against each approach, depending on your needs and circumstances, your applications and workloads, and of course your users and your technical skills.
One thing to be aware of when planning for object storage is that the cloud and traditional storage management take rather different approaches to defining service levels and service level agreements (SLAs). For storage, it has mainly been about access time and throughput but, from the cloud perspective, what matters more is response time, accessibility and ease of use. A simple example of the latter might be a dating website that stores thumbnail photos on fast solid-state storage, and with high-res images on cheaper, slower 7,200rpm disk.
Keeping it in-house
Security will be the big driver for many users. In Europe there are more stringent data privacy regulations than in the US (except perhaps in niche areas such as healthcare). And with the Safe Harbour provisions now ruled inadequate, many will want to keep their objects in-house. Another option is a national or EU-based cloud storage provider, of which there are several.
IT needs a good understanding of the workflow to implement a private object storage cloud. That means understanding the characteristics of the storage underneath and the needs of the users, most notably how quickly they need to access objects.
Applications with very fast access time requirements – such as video editing or transaction processing – are not well suited to the cloud. The same is true for those that shift large amounts of data. If you want to move terabytes a day, you must keep it in-house or pay a lot for a dedicated line to your provider.
Talking of capacity, the cloud was once seen as the go-to place for bulk storage, but now you can fit a petabyte in a single rack, so building a high-capacity private object store in-house is much more feasible than it once was.
Another consideration is whether you can use heterogeneous physical storage. Some object storage software supports this, allowing you to grow at any rate, but with other storage products (and the cloud) you will be limited to whatever the supplier offers you.
Going to the cloud
A big part of why people move to the cloud is capital expenditure versus operating expenditure. To build your own object storage cloud requires investment in hardware and capacity planning. Cloud providers do this for you.
Another factor is that some organisations struggle to find datacentre space and power. The latter can be mitigated by using Maid to power-down inactive disks, or other low-power, deep-storage technologies such as Spectra Logic's S3-compatible Black Pearl tape library controller, but the former can only be fixed with more physical space.
Downsides to using the cloud include the difficulty of getting your data back, either because you no longer have the capacity to store the data on-site or simply because of the bandwidth needed. In some cases, such as bare-metal restores from cloud backup, it is faster to resort to bikernet, where your provider puts the data on a removable drive and gives it to a motorcycle courier.
Contrarily, if you are going to need cloud processing power to do your big data analysis, then it might make sense to have the data in the cloud too, even though getting it back will be a pain.
A third option is a hybrid approach, where you keep active or private data in-house for fast access and security, but move inactive data to cheaper cloud storage. Some object solutions, such as Caringo Swarm, can also allow you to burst from private storage to public when more capacity is needed.