pasted_image_at_2015_04_28_11_05_amThe current wisdom when approaching how and when to use containers includes a common question.  Whether it involves an application that is being re-architected in a micro-services manner, or even a like-for-like migration where an application is going through a V2C (Virtualization to Container) transformation, both at some point have some need for persistent data.

Those that have embraced the Container trend early, seem to have gravitated towards steering clear of Containers when it comes to persistent data.  They have instead turned to Virtual Machines and even Cloud services that serve relational, or NoSQL/Key-Value databases.  Even beyond this, data persistence as a low level topic can serve any number of important purposes in software architecture.

Aside from the common wisdom, people do seem to be leveraging Containers for persistent things. See the Docker Hub, and the top images being downloaded.  Among them you will find Postgres, MySQL, Mongo, Redis and others.

The use of Containers for Data Persistence seems to be an inevitable point in the Container evolutionary path.  For this reason, we at EMC {code} have created two new project repos.  The first being Project REX-Ray.  The idea here is that we accept a few of things about Containers.

  1. It is the Linux Kernel that owns the lions share of running Containers.
  2. There is proper abstraction layer for storage in Linux by way of mounts.
  3. Storage and Infrastructure providers have mature methods of delivering storage to Linux.
  4. The current way of running persistence in Containers involves mounting and attaching directories to Containers where data is stored.


This leaves us thinking about the process outlined in step #4, and noticing that to achieve it these are things that are typically accomplished through Dev and Ops teams separately or through disparate orchestration methods.  This lead us to ask the following questions:

  • Can or should a Container Engine be responsible for provisioning it’s own persistent storage?
  • Can something be built in an abstracted way that enables a Linux guest to request it’s own storage from a infrastructure or storage platform?
  • Can we abstract storage management above infrastructure AND storage providers alike?
  • Can we build a tool that can be embedded in any open-source Container management projects?
  • Can Container Data Volumes becomes 1st class citizens in an infrastructure or storage platform?

REX-Ray serves this purpose.  It’s sort of like having a DHCP-like function for discovery of the host and storage environment that are available to containers – simple, automatic, works everywhere!

It is built in Go since this seems to be the “language of Container management”.  It has the ability to perform “guest introspection for storage”, which means it detects what infrastructure provider (EC2/OpenStack/others) and what storage platforms (EMCScaleIO/Ceph/others) it has access to.  Based on this detection, it then activates drivers that allow it to request storage and then attach that storage to itself.  REX-Ray can be used in a CLI form as well as through Go packages for embedding in other projects.

See the video here..

This then leads us into Project Dogged.  This might be a bold statement, but we like to think of it as the “Docker Easy Button for persistent storage

dogged-logoWe outlined that Linux is the focus of REX-Ray because generally containers live on Linux. Pronounced Dogg-ed and with it’s purpose being persistence, it focuses on enabling Containers with data persistence.  Dogged will represent work within different Container Engine projects that make use of REX-Ray.

The general consensus among the community for Container Engines (CE) is that there needs to be composability built into things.  This means the tools that add value need to work together to avoid mutually exclusive scenarios.  This leads us to designing things that provide enhanced or extra functionality through embedded extensibility options.  But in terms of storage management, we would consider this a core requirement of a CE, and hence for composability reasons requires the CE object model maintain awareness of something representing persistent data.


The current focus on Dogged is to embed Container Data Volume management inside of Docker. This has been achieved using REX-Ray along with EC2, OpenStack, EMCScaleIO, and Ceph as storage/infrastructure providers.  But what does this mean?

A Container Data Volume (CDV) is created/deleted by way of Docker.  This CDV is attached/detached/shared between containers.  This functionality is available in the same manner no matter the provider all by way of one API/CLI, the Docker API/CLI.  It is also composeable via the Docker API and available to be integrated with other Container Management platforms.

REX-Ray and Dogged essentially turn Docker Data Volumes (DVOLs) into 1st class citizens in a storage/infrastructure platform.  They can then take advantage of any number of the following benefits and a super granular level.

  • Applying storage profiles (disk type, IOPS)
  • Snapshots and applying snapshots to new containers
  • Out-of-band data mobility by way of snapshot copy
  • Container OS remains completely non-persistent

What do you think?  If you’re at EMC World next week, come find the EMC {code} team at the [email protected] event (Sunday) or the [email protected] booth (Mon-Wed) and we’d be happy to discuss in more depth. We talked about this a little bit on The Cloudcast podcast this week as well. Also, we’re are actively looking for community contributors to both Dogged and REX-Ray. Our future plans include expanding the Dogged portfolio of CEs, finding storage/platforms for REX-Ray, and further refining abstraction models are top of mind when it comes to collaboration.

Check the brief presentation out at the following link.