Photobank kiev - Fotolia

OpenStack Swift 101: the object store for OpenStack apps

We run the rule over the OpenStack Swift object storage architecture, its key components, how it achieves resiliency and the data protection methods in use and development

OpenStack is a collection of projects that deliver the components required to deploy a service-based private cloud. Code is delivered through twice-yearly alphabetically codenamed releases that introduce new projects, features and enhancements, typically in April and October.

The basic elements of OpenStack include compute (Nova, which delivers virtual machines), networking (Neutron) and storage (handled by Cinder, Swift and Manila).

Cinder delivers support for block-based storage, allowing virtual machine states to be maintained across the creation and destruction of instances. It is an evolution of what were originally called “Nova volumes”.

Block-based storage is great for storing a virtual machine image but less flexible for storing application data. Cinder volumes can’t be shared between running Nova instances, making it difficult to distribute access to data in an environment designed around the transient nature of an individual virtual machine or instance.

To meet the needs for application data storage in OpenStack, the Swift project delivers a reliable, scalable and multi-user accessible object store.

Incidentally, the OpenStack project has also recently introduced Manila, a scale-out file storage system, which we looked at here.

Swift is implemented as an object store, distributed across multiple nodes in an OpenStack infrastructure, using commodity disk storage components such as hard-disk drives and solid-state disks.

The term ‘object store’ implies no specific data format and content is effectively stored as binary objects with associated metadata. Data is stored in and retrieved from a Swift cluster using ReST-based API calls that are based on standard HTTP/S web protocols.

The use of ReST (Representational State Transfer) means that each object within the Swift object store can be accessed through a unique URL, which includes a reference to the object (the object ID) and its location. The open-source version of Swift distributed with OpenStack allows user-generated object IDs to be used when referencing objects in the store.

As OpenStack is by nature a Multi-tenant environment, objects can be stored within Swift with some degree of hierarchy. A Swift object store is divided into accounts (also known as tenants or projects) and containers.

The use of containers provides the ability to apply storage policies to object data – for example, to set the number of replicas kept of each object. Policies are established at the container level. Note that containers in this context are not related to those being popularised by companies like CoreOS and Docker; they are analogous to buckets used in public object stores such as Amazon Web Services.

OpenStack Swift architecture

Swift is implemented through a number of separate service components that deliver the scale-out and resiliency capabilities expected of object stores. These include container servers, account servers, proxy servers and object servers, which are combined into an entity known as a ‘ring’.

Actual object data is stored on object servers, with other services used to implement features such as metadata management and distributed data access and protection.

A service doesn’t imply a separate server. Some services can be run on the same hardware infrastructure, but high levels of resiliency are achieved by running multiple services across separate hardware appliances.

Separating data access services from data storage services allows a Swift instance to scale out in both capacity and performance. Data resiliency is implemented through the use of zones. A zone describes the sub-component of a Swift ring used to store one copy of data.

Resiliency is achieved by creating multiple redundant copies of data (called replicas) and distributing replicas across redundant components (zones) in the infrastructure. This can mean either a single disk drive or separate server, which provides the ability to create high availability through the geographic dispersal of data between datacentres. Requests to read data objects are delivered by the nearest, most consistent copy of that object.

In common with many object stores, Swift implements the concept of eventual consistency for data that is replicated between zones. Block-based storage is focused on the idea of either synchronous (immediate) or asynchronous replication (time-delayed) consistency.

Eventual consistency is similar to asynchronous replication in that the consistency of data is managed in the background, separate from the writing and reading of objects. Object replicas are created as background tasks and replication completed as system resources (including network bandwidth) allow. This kind of replication is more suited to the scale-out Swift model, where individual servers may be offline or inaccessible as part of normal operations.

Erasure coding in Swift

Most commercial object store systems now support the protection of data using erasure coding. Data protection using replicas is expensive in terms of storage capacity (especially with flash storage) whereas erasure coding provides data protection with only a fractional overhead in capacity. The trade-off comes in performance as erasure coding uses algorithms in both the reading and writing of data that transform an object into a set of shards that are distributed across the infrastructure.

Erasure coding is currently only supported in beta mode within Swift, so end users should be careful about deploying it in production environments. However, we can expect erasure coding to be a future standard in Swift deployments, especially those at scale where the space/cost savings are the most beneficial.

This is not to say that improvements aren’t being made to existing data protection features. The Grizzly release of OpenStack, for example, introduced more granular controls to manage replica counts.

Commercial alternatives to Swift

Swift is an open-source platform, with a large amount of the support and coding coming from SwiftStack, a company that provides commercial support for Swift deployments. Other platforms are also available that support the Swift API and can therefore be used to replace or emulate the use of an open-source Swift deployment.

Implementations of the Swift ReST API are supported in object store platforms from Scality (since the Juno OpenStack release), Cleversafe, Cloudian, EMC Isilon, Hitachi HCP and others.

More on OpenStack storage

  • OpenStack Cinder 101: The fundamentals of Cinder, how it is implemented, how to provision it, how it works with third-party storage arrays, its features and more.
  • OpenStack Manila is the file-level access method in development by the open-source cloud platform. What is it, how does it work and when will it be ready?

The benefits of using a commercial storage provider are obvious. Data is protected by hardware and operational processes with which the customer is already familiar. And hardware can be shared with OpenStack and non-OpenStack environments to allow data to be exchanged or moved in and out of a Swift-supported environment while providing data access through traditional protocols such as NFS and SMB.

Using external storage also gives the ability to make use of features such as backup, encryption and mature role-based access controls that are still somewhat scarce in the open-source implementation of Swift.

One thing to bear in mind when using external storage is that there is no requirement to use Swift. It’s perfectly possible to use other object-based APIs such as the S3 API from Amazon Web Services. Although the APIs aren’t directly interoperable, code changes to use either standard are minor in nature.

Read more on SAN, NAS, solid state, RAID

Data Center
Data Management