Monitoring and instrumentation are often afterthoughts when it comes to rolling out new platforms. Anyone who actually needs to run these platforms in production knows that how you measure the health and responsiveness of your platform becomes critically important. In a containerized and microservice architecture, monitoring and instrumentation might be the only way you can visualize these individual services and how they communicate with each other.

This blog is as a step by step guide on how to deploy a Zipkin, OpenTracing, and Prometheus (ZOP) stack on Kubernetes. At the end of this walkthrough, you will be able to measure and quantify the various aspects of Kubernetes and even the ZOP stack itself.

Before You Start

Before we begin, you need an existing Kubernetes cluster up and running with a decent amount of memory and CPU on each Kubernetes worker node. If you need to deploy a cluster, there is setup documentation on the official Kubernetes site. It is assumed you have a good understanding and grasp at operating/navigating a Kubernetes cluster.

To simplify and condense this walkthrough, all configuration files in this blog are in the following GitHub repo: https://github.com/dvonthenen/zop-stack.

Deploying Zipkin leveraging OpenTracing

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. Zipkin itself leverages and implements the OpenTracing API which is a Cloud Native Computing Foundation (CNCF) project that aims to provide a vendor-neutral open standard for distributed tracing. With Zipkin, we are able to visualize or trace how your microservices communicate with each other and quantify the timing of requests within your application. More on this later.

Let’s deploy Zipkin! Starting from the root of the zop-stack repo:

We need to open up the Zipkin port to access the UI. If you are behind a firewall whether it’s on-prem or in your favorite cloud like GCE, don’t forget to open up the NodePort that is allocated!

cd services
kubectl create -f zipkin.yaml
cd ..

Let’s deploy Zipkin

cd deployments
kubectl create -f zipkin.yaml
cd ..

Deploying Prometheus

Prometheus is another CNCF project that provides monitoring and alerting capabilities by capturing time series data to provide metrics for a given application or platform. In this case, we are going to use Prometheus to instrument Zipkin, Kubernetes, Prometheus itself, and a sample application that we will deploy in the subsequent section. We instrument these various components via this Prometheus configuration found in the repo at configs/config.yaml. All time series data that is collected will be stored in temporary scratch space on disk. In a real world deployment, this data would be stored on persistent storage. We will cover this in a follow-up blog post.

Let’s deploy Prometheus! Starting from the root of the zop-stack repo. Save the configuration file in Kubernetes for use by Prometheus

cd configs
kubectl create configmap prometheus --from-file=config.yaml --namespace=kube-system
cd ..

We need to open up the Prometheus port to access the UI. If you are behind a firewall whether it’s on-prem or in your favorite cloud like GCE, don’t forget to open up the NodePort that is allocated!

cd services
kubectl create -f prometheus.yaml
cd ..

Let’s deploy Prometheus

cd deployments
kubectl create -f prometheus-scratch.yaml
cd ..

Instrumenting Kubernetes and more…

Success! We now have a functional ZOP stack that is already pulling in metrics as we speak. If we load up the Prometheus UI, we can see there are a bunch of metrics being collected on the Kubernetes side of things. Some of those things include: apiserver_request_latencies_summary, kubelet_docker_operations, and container_cpu_usage_seconds_total. We can even visualize them in the Prometheus UI in the form of graphs. The apiserver_request_latencies_summary metric is shown below. If you are interested in seeing what else Prometheus can do, such as using alerts, I encourage you to look at the documentation for more details.
NOTE: The default port for Prometheus is 9090, but since we are using NodePort for our service definition, you need to look at the port the service is bound to for access.

So we got a small introduction to Prometheus, but what about Zipkin? To see the full benefit of Zipkin, we are going deploy a small representation of a microservice in the Kubernetes cluster. In this example application, we have 2 microservices. The first is a frontend application that surfaces some HTML to the user and the second backend is a data service that fetches some data for the frontend. In this case, backend just fetches the home page for google.com. This Sample Application uses both the OpenTracing Go Package and Zipkin Go Package to pull this off.

Let’s deploy our Sample Application! Starting from the root of the zop-stack repo. We need to open up the Sample Application port to access the Front UI. If you are behind a firewall whether it’s on-prem or in your favorite cloud like GCE, don’t forget to open up the NodePort that is allocated!

cd services
kubectl create -f backend.yaml
kubectl create -f frontend.yaml
cd ..

Let’s deploy the Sample Application

cd deployments
kubectl create -f backend.yaml
kubectl create -f frontend.yaml
cd ..

Let’s hit the frontend service using curl!

curl https://<PUBLIC_IP_ADDRESS>:<NodePort>

Now let’s visualize what happened in Zipkin. Open up the Zipkin UI, then click the Find Traces button. We can see how long the overall request took represented by the frontend, how long the backend took, and the length of time fetch the HTML for google.com. This type of information is invaluable for measuring performance and troubleshooting when things in your microservices go wrong.
NOTE: The default port for Zipkin is 9411, but since we are using NodePort for our service definition, you will need to take a look at what port the service is bound to for access.

As an added bonus, application metrics for monitoring were also added to our Simple Application and can be surfaced to Prometheus. To make the graph a little more interesting, run the curl command a few more times over a period of 10-30 seconds. Now in the Prometheus UI, find the metric cloud_native_app_frontend_http_requests_total, hit the button Execute and switch to Graph. The metric (number of times the frontend was rendered) itself isn’t really interesting, but the fact that we can also embed metrics into the application to record anything of interest is.

Wrapping This Up!

This walkthrough sets the stage for setting up a Zipkin, OpenTracing, and Prometheus (ZOP) stack that can do both monitoring and tracing for a Kubernetes platform as well as any custom application that enables a Prometheus metric and OpenTracing/Zipkin instrumentation. This information becomes invaluable when attempting to troubleshoot or debug issues in a microservice or containerized environment.

We will have a follow-up blog post that will talk about taking this ZOP stack and have the Prometheus time series data backed by persistent storage in highly available configuration for production use.