Major releases and key improvements keep on coming from the Docker team. The most recent releases are Docker Engine 1.11 and Swarm 1.2 –  these updates are a big deal for storage!

Volume Plugins and Engine 1.11

The Docker Platform continues to expand capabilities while also abstracting key services to ensure flexibility for consumers of the stack. The most visible trait for this can be seen in Engine 1.11 with its containerd integration to gain promised OCI compliance and runtime openness. This was a major overhaul and the good news is that volume plugins and functionality are unaffected and still completely supported!

For those curious about the architecture we’ve made a diagram to highlight how it integrates with plugins. Below you can see the Docker Engine, which now leverages containerd and individual executors, in this case runC, to run containers. If external volumes are used, then the existing plugin system is leveraged from the Engine to the plugin. From there a mount point is returned and passed down to the runC through an OCI bundle.

docker_oci_plugin.png

What else is important? Running Swarm with Engine just got a lot better! There are two key items we’d like to highlight:

Fixed Plugin Communications

Previous to Engine 1.11 there was a problem in the plugin communications where file descriptor leaks would occur with certain API calls. Specifically, Swarm performed these calls continuously which caused stability issues on the host when external volume plugins were being used. Big thanks to members of the community and Donal Byrne for helping to identify the problem.

To address this, the EMC {code} team worked towards creating an important fix for the Engine (here), which fixes all Docker plugins. For 1.11 running Docker+Swarm+Plugins is now a big YES!

Rescheduling Containers and Swarm 1.2

The second item to highlight is the ability to reschedule containers on container host failure. This is critical to running container hosts in a cluster and achieving high availability for containers. This feature came out of experimental state with Swarm 1.2.

Why is this important for storage and volumes? Restarting an ephemeral container is easy. But how about containers that have external volumes? This is a big red light for most volume plugins today as they simply don’t have the logic built into them to move a volume from a dead or unresponsive host to a new host.

REX-Ray as your volume orchestrator with Docker has you covered. REX-Ray includes the ability to preemptively move volumes from one host to another. This means that if a host dies and a container is requested for rescheduling, the volume will be forcefully detached from the old host and attached to the container on the new host. Pretty cool? We think so.

The following launches a postgres container with the on-node-failure environment variable set. This ensures that volumes are restarted on surviving hosts.

docker run -tid -e reschedule:on-node-failure --volume-driver=rexray \
  -v postgres1:/var/lib/postgresql/data postgres

In order to test the reschedule capability we simulated a host failure by powering off a container host. Notice the following log that describes the host detection followed by container restart.

Screen Shot 2016-04-18 at 10.30.08 PM

Demo

We have included a Gist based on Docker Toolbox 1.11 (install this first) that shows how to get a Swarm cluster up and running with REX-Ray installed and configured with pre-emption. The demo leverages VirtualBox so it is intended to be a super simple way to play with volumes from your laptop.