Kubernetes Logging With Elasticsearch and Kibana

30 Aug 2016

Developers, system administrators, and other stakeholders need to access system logs on a daily (sometimes even hourly) basis.

Logs from a couple of servers are easy to generate and handle. But, imagine a Kubernetes cluster with several pods running. Handling and storing logs in itself becomes a huge task.

How does the system administrator collect, manage, and query the logs of the system pods? How does a user query the logs of their application, which is composed of many pods, all of which may be restarted or automatically created by Kubernetes?

Thankfully, Kubernetes supports cluster level logging.

Cluster level logging allows collecting logs for all the pods in the cluster, at a centralized location, with options to search and view the logs.

There are two ways you can use the cluster level logging: either via Google Cloud Logging or Elasticsearch and Kibana.

It is important to note here that the default logging components may vary based on the Kubernetes distribution you are using. For example, if you are running Kubernetes on a Google Container Engine (GKE) cluster, you'll have Google Cloud Logging. But if you're running your Kubernetes cluster on AWS EC2, you'll have Elasticsearch and Kibana available by default.

In this post we'll focus on Kubernetes on an AWS EC2 instance. We'll learn how to set up a Kubernetes cluster on an AWS EC2 instance and ingest logs into Elasticsearch and view them using Kibana.

But first, let's make some introductions.

Introductions

Fluentd

Fluentd is an open source data collection platform that lets you unify the data collection process. It decouples data sources from backend systems by providing a unified logging layer in between, allowing developers to easily generate and send logs from various applications to their preferred location.

Fluentd is integrated with Kubernetes core and Elasticsearch. So, as you boot a Kubernetes cluster on AWS EC2 with proper flags enabled, it already has a fluent-elasticsearch pod running. The pod gathers the container logs and sends them to Elasticsearch. The Fluentd collector communicates to a Kubernetes service that maps requests to specific Elasticsearch pods.

Elasticsearch

Elasticsearch is a highly scalable text search and analytics engine. It allows you to store, search and analyze huge volumes of data quickly in near real time.

Elasticsearch is well known for it's capability to search from huge collections of text data. It is widely used as the underlying engine to power applications with complex search features and requirements.

Elasticsearch is available out of the box with Kubernetes, but you'll have to enable a flag while setting up your cluster to get Elasticsearch. We'll see how to do that in a bit.

Kibana

Kibana is an analytics and visualization platform designed to work with Elasticsearch. You can use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps.

Kibana makes it easy to understand large volumes of data. It has a browser-based interface enabling you to quickly create and share dynamic dashboards that display changes to Elasticsearch queries in real time.

So. Let's see how to to set these tools up.

Setting Up Elasticsearch and Kibana

You need to set the following environment variables:

$ export KUBE_LOGGING_DESTINATION=elasticsearch
$ export KUBE_ENABLE_NODE_LOGGING=true

This is in addition to the other environment variables that are set before the cluster boots up. For example:

$ export KUBE_ENABLE_INSECURE_REGISTRY=true
$ export KUBE_AWS_ZONE=us-west-1c
$ export KUBERNETES_PROVIDER=aws
$ export MASTER_SIZE=t2.medium
$ export NODE_SIZE=t2.large
$ export NUM_NODES=2
$ export MINION_ROOT_DISK_SIZE=100

I will skip the Kubernetes cluster boot up process here. You can check out the steps to boot up your Kubernetes cluster in a previous post.

Once your cluster is up and running, we can check if the Fluentd node-level log collectors will target Elasticsearch. Do this by running:

$ kubectl get pods --namespace=kube-system

You should get a response like this:

NAME                                                   READY     STATUS    RESTARTS   AGE
elasticsearch-logging-v1-36xc3                         1/1       Running   0          11m
elasticsearch-logging-v1-mov98                         1/1       Running   0          11m
fluentd-elasticsearch-ip-172-20-0-132.ap-southeast-1   1/1       Running   0          10m
fluentd-elasticsearch-ip-172-20-0-133.ap-southeast-1   1/1       Running   0          11m
heapster-v1.0.2-862212324-pva62                        4/4       Running   0          11m
kibana-logging-v1-trhpf                                1/1       Running   3          11m
kube-dns-v11-j952f                                     4/4       Running   0          11m
kube-proxy-ip-172-20-0-132.ap-southeast-1.compute      1/1       Running   0          10m
kube-proxy-ip-172-20-0-133.ap-southeast-1.compute      1/1       Running   0          11m
kubernetes-dashboard-v1.0.1-nogkm                      1/1       Running   0          11m
monitoring-influxdb-grafana-v3-20joy                   2/2       Running   0          11m

As you can see, the fluentd-elasticsearch pods are running, indicating fluentd will redirect the logs to Elasticsearch.

This means all the logs generated while the cluster runs will be available to your Elasticsearch and Kibana interface.

Let's look at how we can access those logs.

Accessing Your Logs

The Elasticsearch and Kibana services are not directly exposed via a publicly reachable IP address. Instead, they can be accessed via the service proxy running at the master. You can get the URLs needed to access them by running:

$ kubectl cluster-info

You should see something like this:

Kubernetes master is running at https://52.9.169.19
Elasticsearch is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging
Heapster is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/heapster
Kibana is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/kibana-logging
KubeDNS is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Grafana is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
InfluxDB is running at https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

Now, access the Elasticsearch URL in your browser. You'll be prompted for a username and password.

Run this command:

$ kubectl config view

You should see something like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: REDACTED
    server: https://52.9.169.19
  name: aws_kubernetes
contexts:
- context:
    cluster: aws_kubernetes
    user: aws_kubernetes
  name: aws_kubernetes
current-context: aws_kubernetes
kind: Config
preferences: {}
users:
- name: aws_kubernetes
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED
    token: GaTldD3lpLEyd0IyFQNrIoZGNxwozTYj
- name: aws_kubernetes-basic-auth
  user:
    password: 4tMGyooK6PJovA3g
    username: admin

Use the username admin and enter the auth password listed under the aws_kubernetes-basic-auth section in the output above.

Using Elasticsearch

Once authenticated, the Elasticsearch URL will bring you to the Elasticsearch status page. You can use this page to query logs and find specific info related to the logs.

You can also use URLs to query Elasticsearch directly from your browser's address bar, making use of the authentication established by the web interface.

For example, this query (if you fix the IP address to point to your Elasticsearch instance) will spit out all the logs:

https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging/_search?pretty=true

This query spits out all the logs because we're not specifying any search criteria.

This web interface isn't the only way to query Elasticsearch though. You can also query Elasticsearch directly with curl.

But before that you'll need to retrieve the bearer token.

To do that, run this command:

$ kubectl config view --minify

You should get a response like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: REDACTED
    server: https://52.9.169.19
  name: aws_kubernetes
contexts:
- context:
    cluster: aws_kubernetes
    user: aws_kubernetes
  name: aws_kubernetes
current-context: aws_kubernetes
kind: Config
preferences: {}
users:
- name: aws_kubernetes
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED
    token: SlSso5wV4IeFPYUs8nYsgMD2MfXRoUvb

Here, the value of the token field is your bearer token. Use this token to access Elasticsearch and query it with curl, like so:

$ curl --header "Authorization: Bearer SlSso5wV4IeFPYUs8nYsgMD2MfXRoUvb" --insecure https://52.9.169.19/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging/_search?pretty=true

You should get a response containing all the logs, in JSON pretty formatted layout.

It will look something like this:

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2323,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "logstash-2016.08.24",
      "_type" : "fluentd",
      "_id" : "AVa9nxr5kukoJeeugEfq",
      "_score" : 1.0,
      "_source":{"log":"I0824 17:34:45.245455       1 nanny_lib.go:90] The number of nodes is 2\n","stream":"stderr","docker":{"container_id":"b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c"},"kubernetes":{"namespace_name":"kube-system","pod_name":"heapster-v1.0.2-862212324-b4yhd","container_name":"eventer-nanny"},"tag":"kubernetes.var.log.containers.heapster-v1.0.2-862212324-b4yhd_kube-system_eventer-nanny-b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c.log","@timestamp":"2016-08-24T17:34:45+00:00"}
    }, {
      "_index" : "logstash-2016.08.24",
      "_type" : "fluentd",
      "_id" : "AVa9nxr5kukoJeeugEfv",
      "_score" : 1.0,
      "_source":{"log":"I0824 17:34:55.255310       1 nanny_lib.go:98] The container resources are &{map[cpu:{0.100000000 DecimalSI} memory:{210739200.000000000 BinarySI}] map[cpu:{0.100000000 DecimalSI} memory:{210739200.000000000 BinarySI}]}\n","stream":"stderr","docker":{"container_id":"b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c"},"kubernetes":{"namespace_name":"kube-system","pod_name":"heapster-v1.0.2-862212324-b4yhd","container_name":"eventer-nanny"},"tag":"kubernetes.var.log.containers.heapster-v1.0.2-862212324-b4yhd_kube-system_eventer-nanny-b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c.log","@timestamp":"2016-08-24T17:34:55+00:00"}
    }, {
      "_index" : "logstash-2016.08.24",
      "_type" : "fluentd",
      "_id" : "AVa9nxr5kukoJeeugEf0",
      "_score" : 1.0,
      "_source":{"log":"I0824 17:35:05.261982       1 nanny_lib.go:102] The expected resources are &{map[cpu:{0.100000000 DecimalSI} memory:{211763200.000000000 BinarySI}] map[cpu:{0.100000000 DecimalSI} memory:{211763200.000000000 BinarySI}]}\n","stream":"stderr","docker":{"container_id":"b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c"},"kubernetes":{"namespace_name":"kube-system","pod_name":"heapster-v1.0.2-862212324-b4yhd","container_name":"eventer-nanny"},"tag":"kubernetes.var.log.containers.heapster-v1.0.2-862212324-b4yhd_kube-system_eventer-nanny-b680d26d2c76457d4e9128abf76f6043ecb2a360d2c7b007bc6e55568da69b2c.log","@timestamp":"2016-08-24T17:35:05+00:00"}
     }, {
[...]

If you're interested in learning more about Elasticsearch querying (for example, how to build query URLs) you can find more details in the docs.

Using Kibana

Querying is great, but if you want to get analytics and see visual trends, you can access to the Kibana URL.

The first time you visit the Kibana URL you will be asked to configure your view of the ingested logs. Select the option for time series values and select @timestamp.

On the following page, select the Discover tab and you should see the ingested logs. You can set the refresh interval to 5 seconds to have the logs regularly refreshed.

Here is a typical view of ingested logs in Kibana:

Wrap Up

Logs are important for monitoring the life health of your infrastructure, debugging issues, conducting postmortems, and even predicting future issues. Fortunately, with Kubernetes, there are many tools that can help you with this.

In this post we used Fluentd, Elasticsearch, and Kibana with Kubernetes on AWS EC2 to get aggregated cluster level logging, log querying, and log analytics.

Posted in Kubernetes, logging, Elasticsearch, Kibana

triangle square circle

Did you enjoy this post?