Open source is rapidly becoming the prime choice for anyone building a private cloud or infrastructure-as-a-service (IaaS) platform. Increasingly, this means OpenStack, although Apache CloudStack remains a contender.
However, the question of which open-source storage components you place beneath your private cloud is just as important as the choice of cloud platform, if not more so, and that storage choice will much depend on your needs.
“The transition to cloud is often a big move for an organisation,” says OpenStack Foundation executive director Jonathan Bryce. “Simply saying 'flatten everything' is too much of a shock – it's complicated and it's a challenge. So it is essential to look at your needs, see what's running and how to do it step by step.
“In most enterprises with sizeable storage requirements, there will be traditional disk storage, whether it's direct-attached or remote over the network. There are shared file systems, such as NFS, and then the newest form is object storage.”
Bryce adds: “They all meet different needs within the enterprise, so the first thing is to ask is what storage profile your enterprise has right now. Object storage is usually the smallest, but cloud storage usually points to object storage.
“OpenStack has an advanced object storage system that is open source and is run at large scale in production, where it powers many terabytes of data.”
IaaS frameworks such as CloudStack and OpenStack can manage, tier and integrate with a wide range of storage solutions. For example, they can work with proprietary enterprise-grade storage, and with iSCSI or NFS servers based on commodity hardware and open-source storage systems such as Ceph, GlusterFS, Riak CS and Sheepdog.
Bryce notes that the latter is “more popular in Asia”. He says tech-savvy companies are even creating their own distributed storage by taking server disk and exposing it using LVM in Linux. They then create a storage pool under the hypervisor and share that across VMs. In effect, it is storage virtualisation, but using tools that have been present in Linux for some time.
More on OpenStack storage
OpenStack has its own storage components – Cinder, Swift and Manila. Cinder delivers block storage using standard block protocols such as iSCSI, while Swift provides object storage and is analogous to Amazon S3. Manila, the newest project, adds file-based cloud storage. All these components can accommodate multiple back-end storage resources.
User interaction is key
It is important to recognise that each storage type has its own distinct needs and attributes. For example, the original OpenStack developers had hoped to leverage the block storage service to provide file-based storage too, but they realised that although the two are similar in many ways at the back end, they differ considerably in how users interact with that storage – hence the Manila project.
Access method should guide your choice of storage system, says Sirish Raghuram, CEO of IaaS specialist and managed OpenStack provider Platform9.
“What I see from a lot of customers now in open-source contexts is a relatively simple breakdown between Ceph and Swift,” he says. “Swift is a pure object store for modern apps written to work with something like Amazon S3. Ceph has some overlap with Swift, but also has capabilities that Swift doesn't – you can deploy Ceph for both object and block storage.
“My recommendation, if you're not sure which you will need, is Ceph. If you want S3 and are sure, then it's Swift. For file-based access, a lot of people just end up running NFS. There are a lot of solutions out there running on commodity hardware and software.”
An organisation trying to enable agile software development cares more about agility and workloads that are all very temporary in nature
Sirish Raghuram, Platform9
Raghuram says you should also think hard about whether your application will need shared storage at all.
“Is your data ephemeral or long-lived?” he says. “If you can get by without shared storage, that's quite a simplification.
“For example, an organisation trying to enable agile software development cares more about agility and workloads that are all very temporary in nature. There may not be a need for traditional shared storage.
“On the other hand, production workloads need a backup and replication story, so they'll go for Ceph.”
Referring to the latest Juno release of OpenStack, Bryce says: “Once the back-end systems are configured, you can connect to any storage in the datacentre, set priorities, do migrations, and so on. For example, you could set SolidFire as priority one, Sheepdog as priority two and LVM priority three.
“Then compute resources can ask for specific service levels, and if you realise that a service needs more performance, you can tell the system to move it. It then does a background snapshot and moves it, but admin remains all in one user interface.”
Divergence or flexibility?
What all of the above means is that your private cloud/IaaS platform should not be seen as a single project. It is more of a control and management framework into which you can plug a wide variety of components, some developed under the aegis of the governing organisation (the OpenStack Foundation or Apache, for example) and others not.
This simple fact explains a lot of the confusion and FUD that has been spread in some quarters about OpenStack's supposed divergence and complexity.
That is not to say there isn't complexity – there certainly is. One way to address this complexity is to work with an IT partner that has already assembled a working OpenStack-based cloud platform from the available components.
Examples might include HP, Mirantis, Platform9 and Red Hat. Of course, each platform will have a different mix of implementation technologies and skills, with an inevitable degree of supplier lock-in, and you will need to decide what level of 'proprietaryness' you are willing to tolerate.
Another route could be to consider a more packaged – yet still open – IaaS platform, such as CloudStack.
“OpenStack is standardising datacentre automation and becoming an interconnect in the datacentre,” says Platform9's Raghuram. “That it has proprietary baggage is fair comment, but compare it with, say, vCloud, which is completely proprietary. OpenStack has its APIs clearly defined, regardless of supplier. If you use extensions outside that, it is clear what you are doing – what is core and what is not.”
Open-source cloud storage software
- OpenStack Swift delivers scale-out object storage, with similar functionality to Amazon S3. Modern applications built for the cloud will use object storage rather than files or LUNs.
- OpenStack Cinder is the block storage component, essential in a virtual infrastructure because it is how VMs and their data are stored. It presents storage as a block device via a protocol such as iSCSI or Fibre Channel, and includes features such as volume creation/deletion, snapshots and clones.
- Ceph is designed to run on commodity hardware and provide applications with object, block and file-based storage. Its technical foundation is the Reliable Autonomic Distributed Object Store (RADOS), which distributes data evenly across the storage cluster.
- GlusterFS is a distributed file system that can scale to petabytes of data under a single mount point. It builds a large storage unit from multiple smaller ones across one or many servers, and allows volume size to be increased on the fly by adding more servers. Red Hat Storage Server is a commercial implementation of GlusterFS.
- Sheepdog is a distributed object store with support for OpenStack, iSCSI and QEMU VMs. It includes features such as snapshots, clones and thin provisioning, and can manage thousands of disks and nodes, all from a relatively small code-base.
- Riak CS is designed as an object storage system with a simple operational model that provides an S3-compatible API and OpenStack integration. It provides high-availability, fault-tolerant storage with replication, monitoring and reporting, and the ability to transparently rebalance the storage cluster when a new node is added.
There are also several open-source projects that provide enterprise-grade personal and shareable cloud storage, analogous to Box and Dropbox. Typically, these can be hosted on-premise or using cloud resources such as AWS. Users then run local client apps to backup, sync and share files.