If you’ve been following the blog for a while you have probably seen some of the work the team has put into making Mesos able to leverage underlying storage environments for your persistent applications. In an earlier post we shared a few examples of applications that could benefit from this, but we felt we wanted to show something even better. Enter Cassandra.

We will start the post out with some basics of Casssandra, but for those already familiar here are some interesting points we make here.  Cassandra and other distributed persistent applications can have challenges around:

  • Complexity of install
  • Data movement expense and manual processes with node failures
  • Performance limited by direct attached storage

We are going to show some really interesting things here, where Cassandra can overcome these challenges through running inside of Docker containers with Mesos while leveraging an external storage platform for its persistent data.

What is Cassandra?

In its most basic form, it’s a database. But it’s not just any database, it’s a fully distributed, no SPOF, multi-master, multi-DC, linearly scalable database. There are currently over 660 companies ranging from small to gigantic with names like eBay, Netflix, Reddit, The Weather Channel and others that have publicly declared that they are using Cassandra in production, and the community is growing every day.

Cassandra is extremely fault tolerant because of it’s built-in replication features, it’s highly performant, it’s decentralized and most importantly it’s extremely durable for companies that cannot afford to lose any data.

So why run it on Mesos?

As we explained in the earlier post, Mesos has a lot of the same qualities as Cassandra seeing as it is highly distributed, multi-master and so on. Because of this, we believe Mesos to be a great fit to run these types of applications and data services. We’ll explain two different approaches and what we envision will happen in the near future.

First off we have something called a Mesos Framework. These are mature pieces of software that plugs into Mesos to allow operators and users to deploy applications in a very effect way, examples include Hadoop, Spark, Storm, Jenkins, ElasticSearch, Marathon and of course, Cassandra. This would be the perfect solution for what we’re trying to achieve, if it wasn’t for one small issue: the data storage part isn’t really fully fledged yet. We hope that in the coming months we can see a change and start using even more robust frameworks for software like Cassandra, but right now it didn’t really work for our purposes.Screenshot 2016-01-20 21.21.58

So we went another route, running Cassandra in Docker containers, still on top of Mesos of course but not as a framework but rather as an application group. There have already been a lot of work done here, namely getting Cassandra in containers, but we had to tailor it for our purposes. We now have a Cassandra container image for clustered setups that automatically sets itself up, finds other nodes in the same cluster and joins it. We also added the functionality to use DataStax OpsCenter to manage the cluster nodes, which is nice if you want a graphical representation of your cluster.

Screenshot 2016-01-20 21.24.11

How does it work?

To run Cassandra on Mesos we’re using Marathon to initialize and control our services. Marathon uses simple JSON files as manifests where we can specify exactly how a service or application should run, the amount of resources it should have, what ports should be opened and so on. We created the following example manifest to deploy a 3-node Cassandra cluster, running in Docker containers, and using REX-Ray as the volume driver to be able to carve out proper data storage resources for the service. Each Cassandra node gets its own data volume and when setting up REX-Ray we also included the pre-emptive mount feature to make sure that we can automatically forcefully detach volumes on hosts that are dead or the services have crashed on, attach them to a new host and restart the containerized Cassandra node there with very little delay.

This is extremely important as without it you would have to manually detach and attach the underlying storage which would mean a lot more disruption to your services, and nobody likes that.

Since I’m sure you’d like to see what it looks like in action we have a video showing a demonstration of just this 🙂

The full code and more examples can be found at our vagrant-mesos demo repo on GitHub.

{
  "id": "cassandra-cluster",
  "groups": [
    {
      "id": "cluster1",
      "apps": [
        {
          "id": "cassandra1",
          "container": {
            "docker": {
              "image": "jonasrosland/docker-cassandra:cluster",
              "privileged": true,
              "forcePullImage": true,
              "parameters": [
                {
                  "key": "env",
                  "value": "CASSANDRA_CLUSTERNAME=cassandra-cluster"
                },
                {
                  "key": "env",
                  "value": "CASSANDRA_SEEDS=172.31.2.11,172.31.2.12,172.31.2.13"
                },
                {
                  "key": "volume-driver",
                  "value": "rexray"
                },
                {
                  "key": "volume",
                  "value": "cassandra1:/var/lib/cassandra"
                }
              ]
            }
          },
          "cpus": 2,
          "mem": 8000,
          "instances": 1,
          "constraints": [
            [
              "hostname",
              "UNIQUE"
            ]
          ]
        },
        {
          "id": "cassandra2",
          ...
        },
        {
          "id": "cassandra3",
          ...
        }
      ]
    }
  ]
}

Cassandra itself has a page dedicated to external storage as an “anti-pattern” here.  What do you think?  If we can demonstrate value in handling operations and node failures differently, and can scale better using external storage is it an anti-pattern?  We believe the technologies can be very complimentary and are very interested in continuing work to ensure distributed persistent applications like Cassandra can properly take advantage of external storage.

Overall this has been an extremely interesting project to work on, to show off advanced capabilities in several Open Source projects put together. We are very proud to be a part of this amazing community and are looking forward to even more collaboration across project borders 🙂