kalafoto - stock.adobe.com

Docker backup: Deciding what to protect and how

Docker is a simplified form of virtualisation whose popularity has rocketed. We look at the key decisions about what parts of Docker you need to backup and how to do it

Docker’s frenetic growth is a marvel to observe. A 2018 survey of technologists and enterprise decision makers by RightScale found Docker adoption among 1,000 IT professionals it asked had grown from 30% to 49% in just one year.

And as the container technology has reached the forefront of the DevOps space, users are thinking carefully about how best to manage their workflows. A huge part of this concerns backup.

There are dozens of reasons to have a well-oiled Docker backup procedure; the most obvious of which is contingency. Servers fail and datacentres suffer outages. Crisis management is an inevitable part of any tech role, but by preserving a facsimile of your container or volume you can respond decisively and quickly when things go wrong.

Backups can also help you navigate some of the inherent quirks of Docker.

Let’s suppose one of your containers shows as “unhealthy”, and the usual remedies don’t work. If you’ve got a backup, you can quickly revert to an earlier version that’s known to work.

Similarly, if you need to revert to a previous version of your codebase, if you have a backup of a container that could make that entire process manifestly simpler and save you the effort of having to recreate a build.

Before we go on to examine Docker backup, let’s recap the technology.

Docker 101

Docker is a platform for building and hosting containerised software and as you’d expect from any sophisticated DevOps technology, it does a lot of things.

Firstly, it includes the underlying APIs and tools required to create and provision containers. Secondly, Docker allows developers to host their applications in a way that’s secure and resource-efficient.

Container-based applications are packaged in a way that include the minimum dependencies. These are then hosted in a virtualised environment. Fewer moving parts means there’s less demand on the underlying iron that hosts the applications, especially when compared to traditional virtual machines that include a full operating system.

Docker backup fundamentals

While Docker brings a greater simplicity to virtualisation at some levels, that simplicity masks a great deal of complexity.

So, when we talk about Docker backup it helps to be extremely precise about what we’re talking about. Docker is a large product with many layers of abstraction, each of which can be backed-up.

To complicate matters further, these levels of abstraction vary between the various editions of Docker. Docker EE (Enterprise Edition) is subtly different to Docker CE (Community Edition). For the purposes of this article, we’ll talk about the Enterprise Edition.

Take Docker’s UCP (Universal Control Plane), for example, which handles things like cluster configurations, access control, and certificate management. Given this is the well-tuned engine at the heart of your Docker implementation, you’ll almost definitely want to take regular backups of this. Trust me, you don’t want to rebuild this from scratch.

Read more about Docker

And then you’ve got your swarm. These are the machines you control from within Docker. If you’re more concerned with taking granular backups, you can also image specific volumes and containers.

Finally, there’s the Docker Trusted Registry (DTR). This is the platform used to manage the various images used within your swarm. This too has a separate backup procedure.

This, obviously, is a complicated morass of tools and products, each of which require their own dutiful level of maintenance. I wouldn’t worry too much. One thing I think Docker does exceedingly well is it comes with several built-in command-line tools and exposed APIs that allow you, the user, to manage how you preserve your critical container infrastructure.

This means there’s nothing stopping you from writing a cron job that takes backups of your various Docker components, and then rsyncs the image securely over SSH to a VPS, or perhaps even to an on-premises server.

Docker backup products

Of course, you know that’s not the end of the story. There are a plethora of corporate-built Docker backup tools. And while you don’t necessarily need them, they’re worth considering for a few reasons, including the fact they’re often well-polished and come with features you’d otherwise have to develop yourself.

But arguably the most compelling argument behind many of the various commercial container backup solutions is that they do other stuff too, and that allows you to manage other assets in your IT infrastructure from one product.

Bacula Automated Docker

Using no particular order, let’s look at Bacula Automated Docker first. This is one of the least invasive tools out there. It’s a module that integrates straight into the Docker API and doesn’t require the user to install an agent into each container, which is time-consuming and introduces complexity.

It can then capture the entire container, including read-only and writable layers, and save them as a single image. You can then use this image to spin up brand new containers automatically or manually.

Veritas NetBackup

Veritas NetBackup is an interesting duck. It’s unambiguously an enterprise product, aimed predominantly at high-volume and high-scale environments, and works across in-premises, virtual, and cloud environments. In addition to Docker, it also boasts support for other common data-heavy applications, like MongoDB, MariaDB, and PostgreSQL.

What’s interesting about Veritas NetBackup when it comes to Docker is that it offers three different approaches to data preservation and allows users to choose the most appropriate method, depending on their uptime, performance, and configuration needs.

One method sees the tool create a staging area to store application data. This acts as a buffer and periodically transports user data into a safe location. Another sees Veritas NetBackup integrate into the application using the “sidecar” model, extending its primary functionality with a tool that performs periodic backups of any application data. Finally, Veritas NetBackup can also integrate directly into the application container itself.

You can interpret this in a couple of ways. Either that there’s no one ideal way to handle Docker backups, or merely that users have different needs that, at times, require a subtly different approach. Personally, I lean towards the latter.

Don’t discount Open Source

It’s also worth mentioning that there’s a swathe of compelling open source (or otherwise free) applications that do much of the heavy lifting when it comes to preserving Docker containers and volumes.

One compelling option from Munich-based developer Steffen Bleul is called Volumerize. This does much of the heavy lifting described earlier when talking about how users can script their own backup workflows. It, for example, supports scheduling workflows with cron jobs, and can move files across a network using SCP and Rsync. It even supports several popular cloud storage products, like Google Drive, DropBox, and Amazon S3 (Simple Storage Service).

You’ve got a lot of choice here. Either way, before you do anything, please read up on Docker best practices. It’s far too easy to screw up in the most catastrophic of ways, resulting in corrupt containers and volumes – and yes, a great many headaches.

Read more on Hyper-converged infrastructure

Data Center
Data Management