Storage in containers: The answer to persistent storage needs?

With compute requirements often fragmented, short-lived and bursty, traditional storage architectures can struggle to cope – so is storage in containers the solution?

For decades, the accepted wisdom in storage management has been the need for solid, persistent and (typically) hardware-based data storage systems. Over the past 20-plus years, this has meant a shared storage platform as the primary location for data.

As storage form factors become more fragmented, some suppliers even offer persistent storage based on temporary entities, such as containers. Does this make sense and how can persistence be achieved with such an ethereal construct?

Shared storage arose from a need to reduce costs, consolidate and eliminate the management overhead of storage deployed in hundreds or even thousands of servers in the datacentre.

A shared storage array was a good solution. Fibre Channel and Ethernet networks offered the ability to connect servers over distance, without cabling issues. And servicing one or two (rather than hundreds) of physical devices reduced maintenance and costs for the customer and supplier.

We now live in a different world. Today, applications are mostly virtualised and container technology has started to gain ground. Shared storage is seen as difficult to use, because it focuses on connecting physical servers to physical storage.

But modern applications work on logical volumes, file systems and object stores. Public cloud computing extends this paradigm, and obfuscates the physical view of hardware altogether.

Persistence of data is still important, however. So how can we achieve this while meeting the needs of new application deployment methods? It’s worth looking at what the requirements are.

Containers, storage array, I/O blender

Virtualisation brought us the problem of the I/O blender – an increasingly random workload created by many virtual machines that access the same LUN or file share. To overcome the issues of shared storage in virtualised environments, VMware (for example) its own file system with specific additional commands to reduce contention and fragmentation. We also saw the introduction of features such as VMware Virtual Volumes (VVOLs), which specifically aim to eliminate the physical LUN and treat virtual machines as objects.

Issues in storage access seen with server virtualisation are exacerbated further with containers. In the container world, a single physical host may run hundreds of containers, each vying for storage resources. Having each container access a long and complex storage stack introduces the risk of contention and goes against the benefits of the lightweight nature of a container.

But this is what many suppliers are doing. Volume plugins for Docker, for example, provide automation to map LUNs and volumes on physical arrays to physical hosts and then onto an individual container.

With the increased adoption of public and hybrid cloud architectures, the idea of a central fixed storage array becomes something of a problem. Applications have become more portable, with the ability to spin up containers in seconds and in many different datacentre locations. This paradigm is in distinct contrast to that of physical servers, which typically would be installed and not moved for years, before eventual decommissioning.

As we can see, delivering storage for container environments brings a new set of requirements, including: 

  • Data mobility – Containers move around, so the data has to be able to do that too. Ideally, that means not just between hosts in one datacentre, but across geographic locations.
  • Data security Containers need to be secured at a logical or application level, rather than at the LUN level, because containers expect to be recycled regularly.
  • Performance – Containers introduce the idea of hundreds of unrelated applications working on the same physical host. I/O must be efficient and easy to prioritise.

Delivering storage with containers

One solution to the problem of persistent storage and containers is to use containers themselves as the storage platform.

At glance, this seems like a bad idea. A container is designed to be temporary, so can be discarded at any time. Also, an individual container’s identity is not fixed against anything that traditional storage uses. And there is no concept of host WWNs or iSCSI IQNs, so how can persistent storage with containers be achieved and why is it worth doing?

Let’s address the “why” question .

As we have discussed, containers can be short-lived and were designed for efficiency. Eliminating the I/O stack as much as possible contributes to the overall performance of a container environment. If storage is delivered through a container, the communication path between application and storage is very lightweight – simply between processes on the same server. As an application moves, a container on the host can provide access to the storage, including spinning up a dedicated storage container if one did not already exist.

Read more about containers and storage

  • Containers often need persistent storage, but how do you achieve that? We look at the key options, including Docker volumes versus bind mounts, and Docker Volume Plugins.
  • Red Hat launches storage delivered via containers and predicts a future in which costly and inflexible storage hardware and pricey hypervisors will be a thing of the past.

Clearly, there is a lot of back-end work to be done to keep data protected and available across multiple hosts, but this is less of a challenge than with traditional storage arrays because for many applications, only one container accesses a data volume at any one time.

Disaggregating access to storage in this way eliminates one of the issues we will see as NVMe becomes adopted more widely – the problem of having data pass through a shared controller. NVMe has much greater performance than traditional SAS/SATA, making a shared controller the new bottleneck for storage. Disaggregation helps mitigate this issue, in the same way as hyper-converged systems distribute capacity and performance for storage across a scale-out server architecture.

The question of “how” can be answered by looking at the location for persistence.

The media offers the answer here, with either spinning-disk HDDs or flash drives providing that capability. Configurations, access, and so on can be distributed across multiple nodes and media, with consensus algorithms used to ensure data is protected across multiple nodes and devices. That way, if any host or container delivering storage were to die, another can be spun up or the workload rebalanced across the remaining nodes, including the application itself. By design, the data would move with the application.

Container storage suppliers

This is the kind of architecture that is being implemented by startup companies such as Portworx, OpenEBS, Red Hat and StorageOS. Each uses a distributed node-based scale-out architecture, with storage and applications that run on the same platform. Essentially, it is a hyper-converged model for containers.

Some suppliers, such as Scality (with RING) and Cisco HyperFlex (formerly Springpath), use containers within the architecture for scalability, even though the products are not just for container environments.

For all suppliers, integration with container orchestration platforms is essential. Kubernetes is leading this charge, with Kubernetes Volumes the most likely way for storage to be mapped in these environments.

Maturity issues

There are some issues that still need to be considered as the technology matures.

is the question of data services. Clearly, compression and deduplication have an effect on performance. The efficiency of these features will be key in gaining adoption, as we saw with the all-flash market. End-users will expect data protection, such as snapshots, clones and replication.

Then there is the subject of integration with public cloud. Today’s solutions are mostly focused on single-site implementations, but true mobility means being able to move data around in a hybrid environment, which is much more of a challenge.

Finally, we should highlight issues of security.

The recent Meltdown vulnerability has a specific risk for containers, with the ability to access data from one container to another on unpatched systems. This raises questions about data security and the use of techniques such as in- encryption that may be required to protect against the inadvertent leaking of data.

There is a particular challenge for the startups to solve here, which may have a direct impact on the uptake of container-based storage systems. It may also make some businesses think that the idea of physical isolation (shared storage) goes some way to mitigating against unforeseen risks as they are discovered and reported.

Read more on Cloud storage

Data Center
Data Management