Containers are a hot topic. The technology, which has been most typified by Docker, enables applications to be deployed as a lightweight set of processes rather than an entire virtual machine (VM).
Docker has built out an ecosystem that enables the rapid development of containers, combining a library of application images (the Docker Hub), an execution environment (the container engine) and a set of orchestration features that include the native Swarm application and support for Kubernetes.
Initially, containers were expected to be short-lived, lasting for perhaps a few minutes or hours. Typical use cases were for applications built from microservices and for web-based applications.
However, it has become increasingly obvious that the container ecosystem can be used to deploy much more long-lived applications, including traditional and NoSQL databases.
It is possible to build a range of systems, from typical three-tier to much more complex web-native applications.
Depending on how the technology is implemented, data resilience may exist in the application or be provided by the infrastructure. As a result, we have seen a need for persistence in the storage layer that ensures data is available across container instantiations.
We should not forget that as container technology matures, there are other requirements too. Some of these have yet to be fully developed, but will become more important over time.
- Persistence – Persistence needs to ensure data extends past the life of the container, as well as having resilience across hosts. This is essential if the application has no native resilience built in or a container moves between hosts automatically.
- Security – How is application data secured? This requirement includes the need to encrypt and control access from each container and any other external services (such as backup).
- Performance – How will I/O performance be managed for the data used by each container? There is a need to manage I/O throughput and latency.
- Protection – Data protection still applies in the container world. This includes physical protection of hardware failure, such as drive or media, and the ability to withstand larger infrastructure failures.
- Mobility – Perhaps one of the most interesting challenges will be to manage the availability of data across multiple locations, as containers are used in private and public datacentres.
Dockers volume versus bind mount
The options for storage and containers depend on the intended usage at the container layer.
Within the Docker ecosystem, storage can be delivered from the host that runs the container. This storage space is provided as a newly-created file directory (created when the container starts) or as a directory that already exists on the host.
The former technique is called a Docker volume. This is essentially a Docker-managed directory (usually under the /var/lib/docker/volumes folder).
Docker volumes can be created ahead of time or at the time a container starts and can be shared by multiple containers. It is also possible to use volume plugins, which we will discuss in more detail below.
The second option is to use a directory on the host to map into a container, also called a bind mount. This process can be fast and flexible if the container data does not need to survive if the host dies.
A bind mount can be created in the root file system or on another device connected to the host, for example local storage in the host/server or a shared storage LUN. It is even possible to mount an NFS share into a host and use this to store data too.
However, containers that use bind mounts have read/write access to the file system they connect to. This makes it possible to mount system directories directly into the container, which is probably not a good thing to do. So you need to be careful when specifying the parameters of a bind mount.
Choosing whether to use bind mounts or volumes depends on the usage. The bind mount technique provides an ability to share data between host and container, whereas volumes provide much more independence from the host configuration itself.
Docker volume plugins
What happens if you want to use data from an external storage array?
Using shared storage provides the capability to survive host failures. In a container context, using shared storage takes a little work. The reason for this requires an understanding of where shared storage platforms are developed from.
Traditional SANs (storage area networks) arose from the need to centralise storage that was dispersed across many physical servers. Centralisation provided better management, security, maintenance and efficiency. A physical LUN or volume from a shared array, for example, is mapped and masked to a physical adaptor in the host.
As we moved to virtualisation, storage was mapped to the hypervisor, from where the LUN or volume was divided up. A physical LUN becomes, for example, a data store on VMware vSphere and typically within that data store, each virtual machine disk is a VMDK file. A file share from NAS would have VM directories and VMDK files within those directories.
From a container perspective, storage mapped to a physical host appears as a block device, which then has to be formatted with a file system. This file system can then be mounted to the container to provide persistent storage.
All this sounds quite unwieldy and so, to manage the process more effectively, Docker introduced volume plugins. Suppliers write plugin software to the Docker specification that automates the process of creating the LUN/volume and mapping it correctly to the host and eventually the container.
External storage pros and cons
Naturally, using external storage introduces the benefits of resilience and persistence. If the container dies, the volume can be re-attached. If the host dies, the LUN/volume can be moved and mounted to another host. Performance can be applied at the LUN level, where systems offer features such as QoS (quality of service).
Implementing security is a bit more tricky. An external volume can be mapped to a specific container host, but there is no intrinsic security to assign a volume to a container, other than that provided by the host and container orchestration software. So Kubernetes or Swarm needs to get the assignment correct.
Read more on containers storage
- Red Hat launches storage delivered via containers and predicts a future in which costly and inflexible storage hardware and pricy hypervisors will be a thing of the past.
- Containers and storage: The challenge of persistent storage. There has been a flurry of products and features aimed at changing the way containers relate to storage.
Supplier support comes in the form of Docker plugins. Most of the major storage array suppliers offer plugin support, including: Pure Storage, NetApp, HPE (3PAR and Nimble) and EMC through Rex-Ray and the EMC Code project.
There are also plugins from startups such as Portworx and StorageOS, which have platforms specifically designed for container storage. Platform suppliers such as Red Hat provide support through GlusterFS.
There are also some third-party plugins that provide access to local storage and NFS resources. A list can be found on the Docker plugins page, although this is not updated very often.
Kubernetes is fast becoming the standard for container deployments. The Kubernetes ecosystem provides volume support that does not have to be tied to Docker. A Kubernetes volume exists for the lifetime of a pod, which encompasses multiple containers used to describe an application.
Today, Kubernetes volume support includes generic NFS, iSCSI and Fibre Channel support, as well as cloud-specific offerings such as Microsoft Azure and supplier support, including Portworx, ScaleIO, GlusterFS and StorageOS.
Most external storage support is volume-based and does not directly facilitate the ability to move data from on-premise to the cloud.
We should expect to see offerings improve, with more global file systems used to enable persistence of data in Docker.
Block devices will provide a short-term solution, as the requirements mature and file systems become more adopted.