Container technology provides a great way to create, manage and run applications with a high degree of efficiency. A physical host running Docker, for example, can manage hundreds of containers at the same time, through efficient use of scheduling and memory resources.
A container is effectively a collection of processes needed to run an application that uses features of the underlying operating system to access networking and storage functions.
Operating characteristics such as cgroups and namespaces provide process isolation between containers, making it appear to each container that it is the only instance running on the server.
The initial intention was for containers to be transient or temporary in nature, spun up to manage a particular demand or workload. While this is an efficient way to run applications for the time they are needed, the fact a container (by default) loses all of its data when deleted represents a problem for running applications like databases, where persistent storage is essential, rather than just desirable.
Docker provides mechanisms to enable persistent data across the lifetime of a container. (This article refers to features up to and including Docker 1.12. The Docker ecosystem is rapidly changing and new features are released all the time, so be sure to check which version runtime you are using).
There are four ways to provide persistent storage to Docker containers: data volumes, data volume containers, mounting a host directory and Docker storage plugins.
Docker data volumes
A data volume is a directory within the file system of the host that is used to store persistent data for a container (typically under /var/lib/docker/volumes). The directory appears as a mount point specified by the administrator when starting the container up (eg /data).
By default, volumes are given 64-character randomly generated UUIDs, unless a friendlier name is provided.
Pro tip: It is definitely worth providing a friendly name that relates to the name of the associated container, as this becomes especially helpful when doing clean-up on orphan volumes.
Docker data volumes created in this way will persist across the container being started and stopped. Details of the volumes attached to a container can be found by running docker inspect against the container. Attached volumes are shown in the “Mounts” section. A list of volumes can be found using the docker volume ls command, however there’s no direct way to show a container associated with a volume using the docker volume commands.
Information written to a data volume is managed outside the storage driver that is normally used to manage Docker images. This means the volume is accessed much faster than it would be when writing to a directory within the container. This is because the storage driver doesn’t have to maintain update/change differences from the container image itself.
Unfortunately, data volumes aren’t really that useful because existing volumes can’t be attached to either a running or a new container. This can lead to orphan volumes (volumes with no associated container) and be an issue to clean up, especially when friendly volume names haven’t been used.
Data volume containers
An alternative to volumes is to create a container used specifically to manage volumes. This container doesn’t run any code, but instead acts as a single access point from which other containers can access a data volume.
The advantage of using a volume container is that it effectively allows a volume to be re-used, including across multiple containers. An existing volume can be specified at the time a container is created (as opposed to always being a new container with standard data volumes), allowing a single or multiple containers to access the volume.
Read more on container storage
Containers have been rising in prominence over the past year or so, but June saw a flurry of products and features aimed at changing the way they relate to storage.
Docker nails some of the shortcomings of virtualisation, but what are the fundamentals of Docker and storage?
When the underlying data volume container is stopped, the data becomes orphaned, so any information should be copied out or saved before this happens. This could be achieved simply by running a container to connect to the volume that then executes a backup or copy script.
With standard and volume containers, data in a volume is directly accessible from the container host. This means standard host tools can access the data, if required.
Note, however, that there are no locking mechanisms to prevent either a multiple container or host access, and data corruption can occur if concurrent access isn’t managed correctly.
The third option for persistent data is to mount a directory (or file) from the host itself into the container.
This allows an existing data structure on the host to be presented to a container in a persistent and reusable format. Mounted directories can be either read/write or read-only, depending on the usage required. For example, read-only directories could be used for source code read/write directories for application data.
The use of directory mounts does represent a security risk – a container can be started and given access to the host’s system directories (eg /etc). By default, a container runs with root access and therefore has root access to any mounted directory and the ability to delete or change content.
Docker storage or volume plugins provide a mechanism to access storage on external appliances via traditional protocols, such as iSCSI, Fibre Channel and NFS. The volume driver is specified on creation of a container with the name of the volume and mount point. The driver takes care of managing the creation of storage on the external appliance, creating a file system (for block devices) and mounting that file system into the host, before making accessible to the container.
The volume driver provides simple primitives to make storage accessible from an external appliance, such as “create”, “remove” and “mount”. New features were added in Docker 1.12, including the ability to see the capabilities of the volume driver itself.
There is now a range of storage plugins available from both traditional storage vendors and startups that address the requirement to store container data on persistent storage.
Flocker is a management tool from ClusterHQ that provides the capability to manage container volumes across a cluster of hosts. As containers move between hosts, the Flocker API ensures that volume mount points are made available on the new target host. This provides capability for load balancing and fault tolerance in case of hardware failure.
Many vendors are supporting the Flocker API, including Kaminario, Dell, EMC, NetApp and Pure Storage. There is also support from software startups such as Hedvig and Nexenta.
There are also volume plugins available from HPE 3PAR, NetApp and vSphere. EMC has a project known as REX-Ray that provides vendor-agnostic storage connectivity, but also works with EMC’s storage portfolio, including ScaleIO, XtremIO, VMAX and Isilon.
Support is available for some public cloud platforms, including a driver for Azure File Storage and one to use Google Compute Platform’s persistent disks. There are also third-party plugins that offer simple NFS connectivity to resources available on external file systems. More details on all the available plugins can be found on Docker’s Engine Plugins page.
The capabilities of using persistent storage with Docker have moved on significantly in the last 18 months. The need for persistent data has become an accepted requirement, with local and external array integrations.
But a number of areas such as backup and data portability (between geographically-dispersed data centres) have yet to be solved.
These features will be important in gaining full enterprise adoption as container technology matures and becomes more widely accepted.